- From: Deborah Dahl <dahl@conversational-technologies.com>
- Date: Wed, 24 Feb 2010 08:38:13 -0500
- To: <public-hypertext-cg@w3.org>
This call included a special presentation on the topic
of "Voice on the Web", or how voice input can be better
utilized in web applications.
The slides from the presentation can be found at:
1. slides with notes:
http://www.w3.org/Voice/2010/Talks/VBWG_overview_HTCG_notes.pdf
2. without notes:
http://www.w3.org/Voice/2010/Talks/VBWG_overview_HTCG.pdf
(member) link to HTML formatted minutes
http://www.w3.org/2010/02/12-hcg-minutes.html
Note that there are other member links in the minutes.
and below as text:
[1]W3C
[1] http://www.w3.org/
- DRAFT -
Hypertext Coordination Group Teleconference
12 Feb 2010
[2]Agenda
[2]
http://lists.w3.org/Archives/Member/w3c-html-cg/2010JanMar/0041.html
See also: [3]IRC log
[3] http://www.w3.org/2010/02/12-hcg-irc
Attendees
Present
Dan_Burnett, Kazuyuki, ChrisL, glazou, DKA, Matt_Womer,
Debbie_Dahl, Jim_Larson, Plh, Janina, Doug_Schepers, darobin,
Paul_Cotton, Bert, Dsr, Michael_Cooper
Regrets
Chair
Chris
Scribe
Matt
Contents
* [4]Topics
1. [5]Actions
2. [6]send HCG agendas to the public list?
3. [7]Review of voice technologies and applications
4. [8]Q&A
5. [9]Workshop Announcement
6. [10]Doug's questions
* [11]Summary of Action Items
_________________________________________________________
<trackbot> Date: 12 February 2010
Actions
<ChrisL> [12]http://www.w3.org/MarkUp/CoordGroup/track/actions/open
[12] http://www.w3.org/MarkUp/CoordGroup/track/actions/open
<ChrisL> action-32?
<trackbot> ACTION-32 -- Deborah Dahl to follow up on scxml
implementations from KDE -- due 2009-09-30 -- OPEN
<trackbot> [13]http://www.w3.org/MarkUp/CoordGroup/track/actions/32
[13] http://www.w3.org/MarkUp/CoordGroup/track/actions/32
ChrisL: ACTION-32 for Debbie
ddahl: I haven't done anything myself, but saw ChrisL's note about
implementations.
ChrisL: First post has links to the person who posted it.
ddahl: I think he was in the VB group actually.
kaz: Yes.
ddahl: I'll follow up on that.
<ChrisL> action-42?
<trackbot> ACTION-42 -- Chris Lilley to create telcon time WBS
survey -- due 2010-01-22 -- OPEN
<trackbot> [14]http://www.w3.org/MarkUp/CoordGroup/track/actions/42
[14] http://www.w3.org/MarkUp/CoordGroup/track/actions/42
ChrisL: ACTION-42, I have not yet created yet, not sure which times
to suggest. I did write Cameron, who wrote a tool for it.
... It let you paint in the colors for your availability, then has a
backend app that figures out the best time.
... I don't want to create an unusable 24/7 table in WBS.
... The backend converts it to a single timezone. Still in progress.
... ACTION-43 on Doug
ACTION-43?
<trackbot> ACTION-43 -- Doug Schepers to start a wiki page to
summarize the 2 goals, with Use cases and requirements for each goal
-- due 2010-01-22 -- OPEN
<trackbot> [15]http://www.w3.org/MarkUp/CoordGroup/track/actions/43
[15] http://www.w3.org/MarkUp/CoordGroup/track/actions/43
shepazu: Looked into this last time. There's already a QA page/wiki
that is extensive and covers those two cases.
<ChrisL> close action-43
<trackbot> ACTION-43 Start a wiki page to summarize the 2 goals,
with Use cases and requirements for each goal closed
shepazu: I'll dig up a link and close the action.
ACTION-48?
<trackbot> ACTION-48 does not exist
<plh> [16]http://www.w3.org/Guide/binding-license.html
[16] http://www.w3.org/Guide/binding-license.html
ACTION-28?
<trackbot> ACTION-28 -- Philippe Le Hégaret to write binding-license
advice for /Guide -- due 2010-02-12 -- OPEN
<trackbot> [17]http://www.w3.org/MarkUp/CoordGroup/track/actions/28
[17] http://www.w3.org/MarkUp/CoordGroup/track/actions/28
ChrisL: ACTION-28 on plh, closed, plh sent the link.
<ChrisL> close ACTION-28
<trackbot> ACTION-28 write binding-license advice for /Guide closed
<shepazu>
[18]http://esw.w3.org/topic/QA?action=show&redirect=Quality+Assuranc
e
[18] http://esw.w3.org/topic/QA?action=show&redirect=Quality+Assurance
plh: Text on the page says it all, but if you don't understand it,
please send feedback.
action-44?
<trackbot> ACTION-44 -- Philippe Le Hégaret to look into funding for
browser testing from Web Foundation and NIST -- due 2010-01-22 --
OPEN
<trackbot> [19]http://www.w3.org/MarkUp/CoordGroup/track/actions/44
[19] http://www.w3.org/MarkUp/CoordGroup/track/actions/44
ChrisL: Action-45, ddahl to post minutes.
ddahl: Done.
<ChrisL> close action-45
<trackbot> ACTION-45 Post previous minutes (last meeting) on public
hcg list closed
ChrisL: ACTION-46 on Doug.
ACTION-46?
<trackbot> ACTION-46 -- Doug Schepers to followup with possible spec
revision and explanation of how this relates to XForms -- due
2010-02-05 -- OPEN
<trackbot> [20]http://www.w3.org/MarkUp/CoordGroup/track/actions/46
[20] http://www.w3.org/MarkUp/CoordGroup/track/actions/46
shepazu: It's complete, sent some comments and replies, changed the
DOM3 Events spec. Sent explanation to public hypertext and ??.
ChrisL: That list isn't tracked by tracker.
... It'd be useful to have the link in the action, but we can close
it.
<ChrisL> close action-46
<trackbot> ACTION-46 Followup with possible spec revision and
explanation of how this relates to XForms closed
send HCG agendas to the public list?
ChrisL: ddahl, what's the issue with that?
ddahl: There was a message from the last call about sending these to
the public list, but there are some things that we should work out.
... Like, the agendas are full of member-only links. Just warning
them is fine. Also, no one is subscribed to the list, just 4
subscribers.
ChrisL: If they want immediate notification, they can subscribe, but
more importantly there's an archive that you can point to.
... This establishes some public accountability.
ddahl: This was about agendas, not minutes. I think people in the
HCG should be subscribing to it if they want agendas.
shepazu: It's easy to subscribe the current list of subscribers to
the new list.
ddahl: Could be that missing the agenda will force people to join
the list.
shepazu: Could post it to both. SVG does that.
ddahl: I don't mind posting it to both.
ChrisL: Publish it to the public list and BCC to the member list?
[[notes that bcc makes filtering annoying.]]
ChrisL: Posting agendas to the normal list has worked fine. Have
there been lots of calls for the agendas to be public?
ddahl: The motivation was that the agenda is in the minutes, people
would follow the link and it would be member only.
<shepazu> (I note that if we just made the group public, we wouldn't
have to have this conversation)
ChrisL: I think sending to the public list and BCCing to the member
list is fine.
... Resolution for that?
ddahl: Sounds good to me.
Review of voice technologies and applications
<ChrisL> Dan Burnett,
<ChrisL> Voice Browser Working Group
<ChrisL>
[21]http://lists.w3.org/Archives/Member/w3c-archive/2010Feb/att-0107
/VBWG_overview_HTCG.pdf
[21]
http://lists.w3.org/Archives/Member/w3c-archive/2010Feb/att-0107/VBWG_overvi
ew_HTCG.pdf
burn: I am the co-chief editor (w/ Scott McGlashan) of VoiceXML 3. I
joined the VBWG in 1999, and have been involved in almost all of the
specifications from the group.
ChrisL: Is this discussion public? Let's agree now.
ddahl: I think the VBWG and MMI would like a chance to be able to
register an objection.
ChrisL: We'll leave it member for now, and make the rest public.
<glazou> kaz: np
burn: Slide 2, why I am here: we have a strong belief that voice
technology is not utilized as broadly in Web applications today
because many Web developers aren't aware that it exists or what can
be done with it.
... I'll talk about why this is in a moment, but this is something
we want to see change. W3C is the expert group for Web applications
in general, so we believe this is the right group to target.
... Slide 3.
... Everyone knows what HTML is for -- providing a visual UI for the
Web. VoiceXML is the same thing, but for aural user interfaces for
the Web.
... We all know that a visual output is quite rich, quite useful,
but there are many applications for which mouse/touch input isn't
the best, and voice may be better.
... Voice does an extremely good job on cutting through form
filling.
... Purchasing something online for example, you find something you
are interested in, you are presented with a billing/payment screen,
etc. But a number of times I've really wished to use a particular
kind of payment, e.g. Discover card. In many cases I want to know
that before the payment screen.
... It's possible with a voice interface to allow someone to present
that upfront.
... Even though you're asking them what item they're interested in,
you can still say "I want to use my Discover card" -- it's a great
advantage to the customer to do it right up front.
... Mobile search is another area.
... Localized search on mobile devices, one of the challenges is
that though people text frequently they're not conducive to typing
lots of information.
... For instance, you want to find a sweater, not a particular
store, just where there are sweaters, and because of where you're
heading you want just sweaters near the Coliseum. Voice is very
convenient for search refinement.
... The last example is similar, voice provides a lot of opportunity
for disambiguation. e.g. browsing on a site, where you have an
option to look for something by color or by style, and really what
you want is both.
... It makes sense to provide simple categories for people doing a
visual search, but it is convenient to be able to refine and
disambiguate.
... You may be at a screen that shows you the blue search, or
another showing the search with ruffles, but blue with ruffles is
what you want.
<ChrisL> paul,
[22]http://lists.w3.org/Archives/Member/w3c-archive/2010Feb/att-0107
/VBWG_overview_HTCG.pdf
[22]
http://lists.w3.org/Archives/Member/w3c-archive/2010Feb/att-0107/VBWG_overvi
ew_HTCG.pdf
burn: We believe this group should be interest in voice technology
is that we have discovered that often the needs of the accessibility
are very similar to some of the needs of someone whose eyes and
hands are occupied.
... This is something voice can very much help with, on both the
input and output sides.
... Slide 4.
... 2 main goals, not going to talk about specifications here.
Ultimately we want to make sure that voice technologies are used on
the Web.
... The second goal is 'Make the simple easy and the complex
possible'. We use this phrase all the time in the VBWG because, I
think, of the perceived complexity of voice technologies.
... It's difficult to walk that line between power and complexity.
This is relevant to some of the initiatives we've had.
... Slide 5.
... Web folks may not be aware of what our voice technologies can
do. We're one of the older groups at W3C. We tend to shrink and grow
over time, and are often quite large.
... Originally the needs of the WG were driven by the needs of call
centers. These are the places you call into when you want support or
queries, etc. These operations are usually large and expensive.
... There was a notable absence of standards. The Java community had
created some standards, Microsoft had created an API that was used
by some, etc, but there wasn't a broad standard.
... Many of these APIs didn't provide for scalability. In a Call
Center environment there may be hundreds to thousands of calls at
the same time. Scalability had to be taken into account.
... Anyone who has built Web servers know this. This wasn't
understood so well from the telephony industry. The need for
scalability was something that those in the group understood from
day 1.
... Another thing that came from the needs of the day was that the
voice rendering as either output or input, which was very expensive
at the time (did not run well on small devices).
... It was extremely important that the rendering be able to be
handled in the network.
... It wasn't much like how the original Web was working, but it is
very similar to how Web 2.0 with AJAX, etc is working.
... At the time, the idea of basing this all on Web standards was a
novel idea.
... At the time, telephony folks had large proprietary boxes that
were installed in the phone network somewhere.
... We were very interested at the time to bring the Web
architecture to this technology space.
... So, it was very important to build it on the Web architecture
from the beginning.
... It was a great selling point, we could talk with prospective
customers, let them know that their existing backend infrastructure
would work with a new voice interface.
... Slide 6.
... Today the VBWG is being driven very heavily by mobile voice
needs.
... All of the companies that provide VoiceXML technologies, they
are all interested in mobile devices.
... What does that mean for our group? It means we're paying a lot
more attention to multimodal applications.
... I hesitate to use that buzzword -- any application that has
something you can look at, touch, hear, listens to you, is
multimodal. It doesn't even have to be both visual and aural.
... Could be something that uses geolocation as a modality.
... But in the VBWG we are very interested in small screen devices,
or situations where peoples eyes or hands may be busy.
... So we're very much driven today, by this.
... Local context and search is something else we need to be
integrated with.
... There's also a new breed of developers. Some of them may have
never programmed for other devices before. It's important for us
that we present this to them, even in simpler ways than are
available today.
<shepazu> (in short, they want VoiceXML iPhone apps :) )
burn: This is based on conversations we've had with others at TPAC,
etc. Others think the group is focused purely on Call Center
telephony.
... Slide 7.
... To toot our own horn here... people say when they start a voice
app to ignore all the stuff we do, and just do the simple stuff. But
simple stuff just covers 50% of what you want. To get to 95% of what
you want to do requires a whole heck of a lot of work.
... There's a lot of expertise in the VBWG on voice technologies and
the Web based standards for voice. We're particularly aware of the
network constraints as well.
... It's still important for us to support networked processing.
... The devices are more powerful, sure, but the processing is
getting more complicated.
... Imprecision is something Voice folks know well. I mean
imprecision in the scientific defintion.
... Look at geolocation. Whether it's wifi, cell tower or GPS based,
there's an error factor to it, an imprecision to it.
... The same is true from voice systems. There's an imprecision to
it, that doesn't mean it's not useful information.
... We have a lot of experience in knowing how to properly exploit
confidence information, when using an imprecise technology.
... When we talk we can explain things, we can say many things,
sometimes without saying anything. We have a lot of experience in
how to not only deal with inputs given to the technology but how to
encourage people to provide inputs that work better with the
technology.
... Slide 8
... Going to talk about myths that surround ASR and TTS.
... I use the word lies because after you leave this meeting, if you
spread it, you're lying!
... First up: speech is too error-prone.
... Has anyone not used speech technology?
... One person has said they haven't.
<DKA> FYI I just used google voice search last night on a mobile
(Android) device and it worked very well (on a noisy street-corner).
<glazou> darobin: I have at least 3 devices able to speak and
recognize speech but I never enabled it
<glazou> darobin: even my car do that
burn: I'm not sure I believe it, but it's almost impossible to call
somewhere without having an opportunity to speak rather than push
buttons.
<ChrisL> use of voice menus may vary around the world
ChrisL: Some of that could be geographical restrictions.
burn: Recognition on small mobile devices is probably not as good as
that in the network. The state of the art has progressed a lot, but
in telephony it can take a few years.
<DKA> (but I have a flat mid-atlantic american accent)
burn: Goog-411 for instance.
... Specifically GOOG-411 is there to collect samples to improve
their speech recognition.
... It's increasingly accurate, and as long as you are using it for
the cases for which it was intended, it works fairly well for the
majority of people.
... The 2nd myth is that mobile device screens will be the primary
input.
... Voice has been the primary input for a while.
??: before they even had screens.
burn: We have a strong belief that voice will remain a strong
component for a long time.
... The last myth is about Web developers.
... We hear it's too complicated, etc.
... The best people to use this technology today is Web developers.
... I've worked at companies that both provide and use Voice
technology.
burn: A big plus has been to be able to tap into the Web development
community.
... Making them more accessible to the average Web developer is
something we continue to do.
... Slide 9
... What are we working on today?
... Three main high level goals: get out into the world what voice
can do, what it does today in the call center world, what it does
today in the mobile world, and what we expect it to do in the future
of the Web.
... We want to see future languages at w3c, including ours, to play
well together.
... We haven't seen much integration between specs.
... This is something we've been giving a lot of attention to since
the TPAC. There's been an increasing interest in this.
ChrisL: We want to increase the flexibility of the language today.
burn: We're making quite a number of changes that generalize aspects
of VoiceXML that are hard coded today.
... In VoiceXML 3 we're generalizing it in ways that allow
developers to create a paradigm for your dialog construct, and then
allow other developers to make use of that paradigm.
... It will simplify the coding for the second group of developers.
... Slide 10
... We need you!
... Don't just pass this along to your WGs (but do that too), we
need you to do it.
... I've listed three sites with free access to developers.
... They provide free hosting to try out Voice applications.
... Each one has a tutorial for how to build these.
... e.g. build a front end to your phone, something that lets
callers chose to leave a message, ask for a callback, whatever. It's
easy to do.
... Try it out. There are a lot of cool things you can do without a
lot of work.
... We need people on this call to work with us on building a
combined group of people to build the best future applications.
... VBWG does not contain HTML experts.
... The HTML WG probably does not contain the worlds' experts on
VoiceXML.
... We think it'd be really helpful to get a subset of the two
groups together to figure out how best to build these combined
applications in the future.
... Learn about voice, it's not that hard.
<Zakim> janina, you wanted to say i have too much frustration with
it, though
burn: Figure out how to create a combined group of experts for
building future applications of the Web.
Q&A
janina: Thanks for the overview and the mentions of accessibility.
... I think of two kinds of things that might be helpful around
imprecision and ambiguity..
... Is anyone experimenting with military style alphabets?
... And is anyone building in changes to adapt to the users' working
rate? e.g. bargein.
<shepazu> (Hotel Tango Foxtrot)
janina: Even if you bargein the system still tends to speak slowly,
which is fine for 90% of the users', but blind users' know you don't
get much work done at the slower speech rate.
... I'd get frustrated quickly and wouldn't do very much.
burn: Last one first: VoiceXML is working on Real Time Controls
(RTCs), specialized grammars that act instantly and can cause
certain things to occur.
... The primary use cases are speeding up/slowing down/volume
adjustments, etc.
... It's not easy to do today in VoiceXML 2, but it is something we
are improving in the language.
... Military style alphabets: that is definitely something that is
more of an application design issue.
... One of the situations where we find that being used more often
is when someone needs to give an id, or a policy number that is not
just numeric but alphanumeric.
... Voxeo, my company, does recommend that.
... Nuance, recommended that more than ten years ago, spending a lot
of time figuring out what words people tend to use, not just the
military alphabet.
ChrisL: Having to spell out stock symbols is probably easier to say
the company.
burn: Well, the policy numbers is a better example perhaps.
janina: The thing I was trying would recognize ??
<Zakim> ChrisL, you wanted to ask about separate voice interfaces vs
multimodal
ChrisL: There is this impression that the voice folks work on one
side to make call centers better, yet that has little to do with the
Web.
... But the idea of having a multi-modal site that is primarily
visual, but you want people to be able to fill in one of the forms
by saying 'pick this field and fill it in', I don't see much
deployment of that.
... That is traditionally seen as up to the browser.
... It's like a second class translation that the Web developer
shouldn't have to do anything.
burn: I think that has been a limitation for desktop based devices.
Historically that's got people using it who don't have
speakers/microphones or not turned on, etc, but that's not the case
of the mobile.
... On the mobile it's often the case that the mic and the screen
are on simultaneously.
... The voice interface, due to how we learned to speak, is not
nearly as simple as taking a Web page and making those words
available.
... Through experience we've discovered that it's important to have
control, explicitly over the Voice interface.
... You create a Web page, and maybe you have a fancy browser that
lets you talk to it, that's great, but to make a good voice user
interface that is good and efficient for people, you really want to
code it directly.
ChrisL: I didn't want to imply that they wanted to do it that way.
It's just seen as a client coding issue.
<burn> yes, we're out of time, but I can stay as long as the group
wants
<Zakim> shepazu, you wanted to mention DAP
Workshop Announcement
<dsr> W3C invites people to participate in a Workshop on Future
Standards for Model-Based User Interfaces on 13-14 May 2010 in Rome.
This Workship will examine the challenges facing Web developers due
to variations in device capabilities, modes of interaction and
software standards, the need to support assistive technologies for
accessibility, and the demand for richer user interfaces.
<dsr> The Workshop will focus on reviewing research on model-based
design of context-sensitive user interfaces in relation to these
challenges, and the opportunities for new open standards in the area
of Model-Based User Interfaces. The Workshop, hosted by CNR-ISTI[14,
is free of charge and open to anyone, subject to review of their
statement of interest and space availability. Statements of Interest
are due 2 April 2010. See the call for participation[$1\47] for more
informat
dsr: The Model Based UI Incubator Group is hosting a workshop
mid-may.
<dsr> ion.
<dsr> [$1\47] [23]http://www.isti.cnr.it/
[23] http://www.isti.cnr.it/
<dsr> [$1\47] [24]http://www.w3.org/2010/02/mbui/cfp.html
[24] http://www.w3.org/2010/02/mbui/cfp.html
ChrisL: Please continue discussing on the list.
Doug's questions
shepazu: Shouldn't this be made public?
burn: Personally, I agree, I doubt there's anything that would be a
problem, but we just need a quick check with the groups.
... I'm also going to be doing a project review with w3c staff.
... Absolutely we want this to get out, even though this
presentation was definitely targeted at this group.
shepazu: I think you need to be talking to DAP, they're covering
cameras, and I assume they either are or should be using microphones
as well.
shepazu: I think this is tightly integrated with MMI, but with the
momentum behind Web apps, I agree that you need to work with a
larger number of groups.
burn: That was our belief and what we want to do.
<kaz> [adjourned]
Summary of Action Items
[End of minutes]
Received on Wednesday, 24 February 2010 13:38:53 UTC