- From: Deborah Dahl <dahl@conversational-technologies.com>
- Date: Wed, 24 Feb 2010 08:38:13 -0500
- To: <public-hypertext-cg@w3.org>
This call included a special presentation on the topic of "Voice on the Web", or how voice input can be better utilized in web applications. The slides from the presentation can be found at: 1. slides with notes: http://www.w3.org/Voice/2010/Talks/VBWG_overview_HTCG_notes.pdf 2. without notes: http://www.w3.org/Voice/2010/Talks/VBWG_overview_HTCG.pdf (member) link to HTML formatted minutes http://www.w3.org/2010/02/12-hcg-minutes.html Note that there are other member links in the minutes. and below as text: [1]W3C [1] http://www.w3.org/ - DRAFT - Hypertext Coordination Group Teleconference 12 Feb 2010 [2]Agenda [2] http://lists.w3.org/Archives/Member/w3c-html-cg/2010JanMar/0041.html See also: [3]IRC log [3] http://www.w3.org/2010/02/12-hcg-irc Attendees Present Dan_Burnett, Kazuyuki, ChrisL, glazou, DKA, Matt_Womer, Debbie_Dahl, Jim_Larson, Plh, Janina, Doug_Schepers, darobin, Paul_Cotton, Bert, Dsr, Michael_Cooper Regrets Chair Chris Scribe Matt Contents * [4]Topics 1. [5]Actions 2. [6]send HCG agendas to the public list? 3. [7]Review of voice technologies and applications 4. [8]Q&A 5. [9]Workshop Announcement 6. [10]Doug's questions * [11]Summary of Action Items _________________________________________________________ <trackbot> Date: 12 February 2010 Actions <ChrisL> [12]http://www.w3.org/MarkUp/CoordGroup/track/actions/open [12] http://www.w3.org/MarkUp/CoordGroup/track/actions/open <ChrisL> action-32? <trackbot> ACTION-32 -- Deborah Dahl to follow up on scxml implementations from KDE -- due 2009-09-30 -- OPEN <trackbot> [13]http://www.w3.org/MarkUp/CoordGroup/track/actions/32 [13] http://www.w3.org/MarkUp/CoordGroup/track/actions/32 ChrisL: ACTION-32 for Debbie ddahl: I haven't done anything myself, but saw ChrisL's note about implementations. ChrisL: First post has links to the person who posted it. ddahl: I think he was in the VB group actually. kaz: Yes. ddahl: I'll follow up on that. <ChrisL> action-42? <trackbot> ACTION-42 -- Chris Lilley to create telcon time WBS survey -- due 2010-01-22 -- OPEN <trackbot> [14]http://www.w3.org/MarkUp/CoordGroup/track/actions/42 [14] http://www.w3.org/MarkUp/CoordGroup/track/actions/42 ChrisL: ACTION-42, I have not yet created yet, not sure which times to suggest. I did write Cameron, who wrote a tool for it. ... It let you paint in the colors for your availability, then has a backend app that figures out the best time. ... I don't want to create an unusable 24/7 table in WBS. ... The backend converts it to a single timezone. Still in progress. ... ACTION-43 on Doug ACTION-43? <trackbot> ACTION-43 -- Doug Schepers to start a wiki page to summarize the 2 goals, with Use cases and requirements for each goal -- due 2010-01-22 -- OPEN <trackbot> [15]http://www.w3.org/MarkUp/CoordGroup/track/actions/43 [15] http://www.w3.org/MarkUp/CoordGroup/track/actions/43 shepazu: Looked into this last time. There's already a QA page/wiki that is extensive and covers those two cases. <ChrisL> close action-43 <trackbot> ACTION-43 Start a wiki page to summarize the 2 goals, with Use cases and requirements for each goal closed shepazu: I'll dig up a link and close the action. ACTION-48? <trackbot> ACTION-48 does not exist <plh> [16]http://www.w3.org/Guide/binding-license.html [16] http://www.w3.org/Guide/binding-license.html ACTION-28? <trackbot> ACTION-28 -- Philippe Le Hégaret to write binding-license advice for /Guide -- due 2010-02-12 -- OPEN <trackbot> [17]http://www.w3.org/MarkUp/CoordGroup/track/actions/28 [17] http://www.w3.org/MarkUp/CoordGroup/track/actions/28 ChrisL: ACTION-28 on plh, closed, plh sent the link. <ChrisL> close ACTION-28 <trackbot> ACTION-28 write binding-license advice for /Guide closed <shepazu> [18]http://esw.w3.org/topic/QA?action=show&redirect=Quality+Assuranc e [18] http://esw.w3.org/topic/QA?action=show&redirect=Quality+Assurance plh: Text on the page says it all, but if you don't understand it, please send feedback. action-44? <trackbot> ACTION-44 -- Philippe Le Hégaret to look into funding for browser testing from Web Foundation and NIST -- due 2010-01-22 -- OPEN <trackbot> [19]http://www.w3.org/MarkUp/CoordGroup/track/actions/44 [19] http://www.w3.org/MarkUp/CoordGroup/track/actions/44 ChrisL: Action-45, ddahl to post minutes. ddahl: Done. <ChrisL> close action-45 <trackbot> ACTION-45 Post previous minutes (last meeting) on public hcg list closed ChrisL: ACTION-46 on Doug. ACTION-46? <trackbot> ACTION-46 -- Doug Schepers to followup with possible spec revision and explanation of how this relates to XForms -- due 2010-02-05 -- OPEN <trackbot> [20]http://www.w3.org/MarkUp/CoordGroup/track/actions/46 [20] http://www.w3.org/MarkUp/CoordGroup/track/actions/46 shepazu: It's complete, sent some comments and replies, changed the DOM3 Events spec. Sent explanation to public hypertext and ??. ChrisL: That list isn't tracked by tracker. ... It'd be useful to have the link in the action, but we can close it. <ChrisL> close action-46 <trackbot> ACTION-46 Followup with possible spec revision and explanation of how this relates to XForms closed send HCG agendas to the public list? ChrisL: ddahl, what's the issue with that? ddahl: There was a message from the last call about sending these to the public list, but there are some things that we should work out. ... Like, the agendas are full of member-only links. Just warning them is fine. Also, no one is subscribed to the list, just 4 subscribers. ChrisL: If they want immediate notification, they can subscribe, but more importantly there's an archive that you can point to. ... This establishes some public accountability. ddahl: This was about agendas, not minutes. I think people in the HCG should be subscribing to it if they want agendas. shepazu: It's easy to subscribe the current list of subscribers to the new list. ddahl: Could be that missing the agenda will force people to join the list. shepazu: Could post it to both. SVG does that. ddahl: I don't mind posting it to both. ChrisL: Publish it to the public list and BCC to the member list? [[notes that bcc makes filtering annoying.]] ChrisL: Posting agendas to the normal list has worked fine. Have there been lots of calls for the agendas to be public? ddahl: The motivation was that the agenda is in the minutes, people would follow the link and it would be member only. <shepazu> (I note that if we just made the group public, we wouldn't have to have this conversation) ChrisL: I think sending to the public list and BCCing to the member list is fine. ... Resolution for that? ddahl: Sounds good to me. Review of voice technologies and applications <ChrisL> Dan Burnett, <ChrisL> Voice Browser Working Group <ChrisL> [21]http://lists.w3.org/Archives/Member/w3c-archive/2010Feb/att-0107 /VBWG_overview_HTCG.pdf [21] http://lists.w3.org/Archives/Member/w3c-archive/2010Feb/att-0107/VBWG_overvi ew_HTCG.pdf burn: I am the co-chief editor (w/ Scott McGlashan) of VoiceXML 3. I joined the VBWG in 1999, and have been involved in almost all of the specifications from the group. ChrisL: Is this discussion public? Let's agree now. ddahl: I think the VBWG and MMI would like a chance to be able to register an objection. ChrisL: We'll leave it member for now, and make the rest public. <glazou> kaz: np burn: Slide 2, why I am here: we have a strong belief that voice technology is not utilized as broadly in Web applications today because many Web developers aren't aware that it exists or what can be done with it. ... I'll talk about why this is in a moment, but this is something we want to see change. W3C is the expert group for Web applications in general, so we believe this is the right group to target. ... Slide 3. ... Everyone knows what HTML is for -- providing a visual UI for the Web. VoiceXML is the same thing, but for aural user interfaces for the Web. ... We all know that a visual output is quite rich, quite useful, but there are many applications for which mouse/touch input isn't the best, and voice may be better. ... Voice does an extremely good job on cutting through form filling. ... Purchasing something online for example, you find something you are interested in, you are presented with a billing/payment screen, etc. But a number of times I've really wished to use a particular kind of payment, e.g. Discover card. In many cases I want to know that before the payment screen. ... It's possible with a voice interface to allow someone to present that upfront. ... Even though you're asking them what item they're interested in, you can still say "I want to use my Discover card" -- it's a great advantage to the customer to do it right up front. ... Mobile search is another area. ... Localized search on mobile devices, one of the challenges is that though people text frequently they're not conducive to typing lots of information. ... For instance, you want to find a sweater, not a particular store, just where there are sweaters, and because of where you're heading you want just sweaters near the Coliseum. Voice is very convenient for search refinement. ... The last example is similar, voice provides a lot of opportunity for disambiguation. e.g. browsing on a site, where you have an option to look for something by color or by style, and really what you want is both. ... It makes sense to provide simple categories for people doing a visual search, but it is convenient to be able to refine and disambiguate. ... You may be at a screen that shows you the blue search, or another showing the search with ruffles, but blue with ruffles is what you want. <ChrisL> paul, [22]http://lists.w3.org/Archives/Member/w3c-archive/2010Feb/att-0107 /VBWG_overview_HTCG.pdf [22] http://lists.w3.org/Archives/Member/w3c-archive/2010Feb/att-0107/VBWG_overvi ew_HTCG.pdf burn: We believe this group should be interest in voice technology is that we have discovered that often the needs of the accessibility are very similar to some of the needs of someone whose eyes and hands are occupied. ... This is something voice can very much help with, on both the input and output sides. ... Slide 4. ... 2 main goals, not going to talk about specifications here. Ultimately we want to make sure that voice technologies are used on the Web. ... The second goal is 'Make the simple easy and the complex possible'. We use this phrase all the time in the VBWG because, I think, of the perceived complexity of voice technologies. ... It's difficult to walk that line between power and complexity. This is relevant to some of the initiatives we've had. ... Slide 5. ... Web folks may not be aware of what our voice technologies can do. We're one of the older groups at W3C. We tend to shrink and grow over time, and are often quite large. ... Originally the needs of the WG were driven by the needs of call centers. These are the places you call into when you want support or queries, etc. These operations are usually large and expensive. ... There was a notable absence of standards. The Java community had created some standards, Microsoft had created an API that was used by some, etc, but there wasn't a broad standard. ... Many of these APIs didn't provide for scalability. In a Call Center environment there may be hundreds to thousands of calls at the same time. Scalability had to be taken into account. ... Anyone who has built Web servers know this. This wasn't understood so well from the telephony industry. The need for scalability was something that those in the group understood from day 1. ... Another thing that came from the needs of the day was that the voice rendering as either output or input, which was very expensive at the time (did not run well on small devices). ... It was extremely important that the rendering be able to be handled in the network. ... It wasn't much like how the original Web was working, but it is very similar to how Web 2.0 with AJAX, etc is working. ... At the time, the idea of basing this all on Web standards was a novel idea. ... At the time, telephony folks had large proprietary boxes that were installed in the phone network somewhere. ... We were very interested at the time to bring the Web architecture to this technology space. ... So, it was very important to build it on the Web architecture from the beginning. ... It was a great selling point, we could talk with prospective customers, let them know that their existing backend infrastructure would work with a new voice interface. ... Slide 6. ... Today the VBWG is being driven very heavily by mobile voice needs. ... All of the companies that provide VoiceXML technologies, they are all interested in mobile devices. ... What does that mean for our group? It means we're paying a lot more attention to multimodal applications. ... I hesitate to use that buzzword -- any application that has something you can look at, touch, hear, listens to you, is multimodal. It doesn't even have to be both visual and aural. ... Could be something that uses geolocation as a modality. ... But in the VBWG we are very interested in small screen devices, or situations where peoples eyes or hands may be busy. ... So we're very much driven today, by this. ... Local context and search is something else we need to be integrated with. ... There's also a new breed of developers. Some of them may have never programmed for other devices before. It's important for us that we present this to them, even in simpler ways than are available today. <shepazu> (in short, they want VoiceXML iPhone apps :) ) burn: This is based on conversations we've had with others at TPAC, etc. Others think the group is focused purely on Call Center telephony. ... Slide 7. ... To toot our own horn here... people say when they start a voice app to ignore all the stuff we do, and just do the simple stuff. But simple stuff just covers 50% of what you want. To get to 95% of what you want to do requires a whole heck of a lot of work. ... There's a lot of expertise in the VBWG on voice technologies and the Web based standards for voice. We're particularly aware of the network constraints as well. ... It's still important for us to support networked processing. ... The devices are more powerful, sure, but the processing is getting more complicated. ... Imprecision is something Voice folks know well. I mean imprecision in the scientific defintion. ... Look at geolocation. Whether it's wifi, cell tower or GPS based, there's an error factor to it, an imprecision to it. ... The same is true from voice systems. There's an imprecision to it, that doesn't mean it's not useful information. ... We have a lot of experience in knowing how to properly exploit confidence information, when using an imprecise technology. ... When we talk we can explain things, we can say many things, sometimes without saying anything. We have a lot of experience in how to not only deal with inputs given to the technology but how to encourage people to provide inputs that work better with the technology. ... Slide 8 ... Going to talk about myths that surround ASR and TTS. ... I use the word lies because after you leave this meeting, if you spread it, you're lying! ... First up: speech is too error-prone. ... Has anyone not used speech technology? ... One person has said they haven't. <DKA> FYI I just used google voice search last night on a mobile (Android) device and it worked very well (on a noisy street-corner). <glazou> darobin: I have at least 3 devices able to speak and recognize speech but I never enabled it <glazou> darobin: even my car do that burn: I'm not sure I believe it, but it's almost impossible to call somewhere without having an opportunity to speak rather than push buttons. <ChrisL> use of voice menus may vary around the world ChrisL: Some of that could be geographical restrictions. burn: Recognition on small mobile devices is probably not as good as that in the network. The state of the art has progressed a lot, but in telephony it can take a few years. <DKA> (but I have a flat mid-atlantic american accent) burn: Goog-411 for instance. ... Specifically GOOG-411 is there to collect samples to improve their speech recognition. ... It's increasingly accurate, and as long as you are using it for the cases for which it was intended, it works fairly well for the majority of people. ... The 2nd myth is that mobile device screens will be the primary input. ... Voice has been the primary input for a while. ??: before they even had screens. burn: We have a strong belief that voice will remain a strong component for a long time. ... The last myth is about Web developers. ... We hear it's too complicated, etc. ... The best people to use this technology today is Web developers. ... I've worked at companies that both provide and use Voice technology. burn: A big plus has been to be able to tap into the Web development community. ... Making them more accessible to the average Web developer is something we continue to do. ... Slide 9 ... What are we working on today? ... Three main high level goals: get out into the world what voice can do, what it does today in the call center world, what it does today in the mobile world, and what we expect it to do in the future of the Web. ... We want to see future languages at w3c, including ours, to play well together. ... We haven't seen much integration between specs. ... This is something we've been giving a lot of attention to since the TPAC. There's been an increasing interest in this. ChrisL: We want to increase the flexibility of the language today. burn: We're making quite a number of changes that generalize aspects of VoiceXML that are hard coded today. ... In VoiceXML 3 we're generalizing it in ways that allow developers to create a paradigm for your dialog construct, and then allow other developers to make use of that paradigm. ... It will simplify the coding for the second group of developers. ... Slide 10 ... We need you! ... Don't just pass this along to your WGs (but do that too), we need you to do it. ... I've listed three sites with free access to developers. ... They provide free hosting to try out Voice applications. ... Each one has a tutorial for how to build these. ... e.g. build a front end to your phone, something that lets callers chose to leave a message, ask for a callback, whatever. It's easy to do. ... Try it out. There are a lot of cool things you can do without a lot of work. ... We need people on this call to work with us on building a combined group of people to build the best future applications. ... VBWG does not contain HTML experts. ... The HTML WG probably does not contain the worlds' experts on VoiceXML. ... We think it'd be really helpful to get a subset of the two groups together to figure out how best to build these combined applications in the future. ... Learn about voice, it's not that hard. <Zakim> janina, you wanted to say i have too much frustration with it, though burn: Figure out how to create a combined group of experts for building future applications of the Web. Q&A janina: Thanks for the overview and the mentions of accessibility. ... I think of two kinds of things that might be helpful around imprecision and ambiguity.. ... Is anyone experimenting with military style alphabets? ... And is anyone building in changes to adapt to the users' working rate? e.g. bargein. <shepazu> (Hotel Tango Foxtrot) janina: Even if you bargein the system still tends to speak slowly, which is fine for 90% of the users', but blind users' know you don't get much work done at the slower speech rate. ... I'd get frustrated quickly and wouldn't do very much. burn: Last one first: VoiceXML is working on Real Time Controls (RTCs), specialized grammars that act instantly and can cause certain things to occur. ... The primary use cases are speeding up/slowing down/volume adjustments, etc. ... It's not easy to do today in VoiceXML 2, but it is something we are improving in the language. ... Military style alphabets: that is definitely something that is more of an application design issue. ... One of the situations where we find that being used more often is when someone needs to give an id, or a policy number that is not just numeric but alphanumeric. ... Voxeo, my company, does recommend that. ... Nuance, recommended that more than ten years ago, spending a lot of time figuring out what words people tend to use, not just the military alphabet. ChrisL: Having to spell out stock symbols is probably easier to say the company. burn: Well, the policy numbers is a better example perhaps. janina: The thing I was trying would recognize ?? <Zakim> ChrisL, you wanted to ask about separate voice interfaces vs multimodal ChrisL: There is this impression that the voice folks work on one side to make call centers better, yet that has little to do with the Web. ... But the idea of having a multi-modal site that is primarily visual, but you want people to be able to fill in one of the forms by saying 'pick this field and fill it in', I don't see much deployment of that. ... That is traditionally seen as up to the browser. ... It's like a second class translation that the Web developer shouldn't have to do anything. burn: I think that has been a limitation for desktop based devices. Historically that's got people using it who don't have speakers/microphones or not turned on, etc, but that's not the case of the mobile. ... On the mobile it's often the case that the mic and the screen are on simultaneously. ... The voice interface, due to how we learned to speak, is not nearly as simple as taking a Web page and making those words available. ... Through experience we've discovered that it's important to have control, explicitly over the Voice interface. ... You create a Web page, and maybe you have a fancy browser that lets you talk to it, that's great, but to make a good voice user interface that is good and efficient for people, you really want to code it directly. ChrisL: I didn't want to imply that they wanted to do it that way. It's just seen as a client coding issue. <burn> yes, we're out of time, but I can stay as long as the group wants <Zakim> shepazu, you wanted to mention DAP Workshop Announcement <dsr> W3C invites people to participate in a Workshop on Future Standards for Model-Based User Interfaces on 13-14 May 2010 in Rome. This Workship will examine the challenges facing Web developers due to variations in device capabilities, modes of interaction and software standards, the need to support assistive technologies for accessibility, and the demand for richer user interfaces. <dsr> The Workshop will focus on reviewing research on model-based design of context-sensitive user interfaces in relation to these challenges, and the opportunities for new open standards in the area of Model-Based User Interfaces. The Workshop, hosted by CNR-ISTI[14, is free of charge and open to anyone, subject to review of their statement of interest and space availability. Statements of Interest are due 2 April 2010. See the call for participation[$1\47] for more informat dsr: The Model Based UI Incubator Group is hosting a workshop mid-may. <dsr> ion. <dsr> [$1\47] [23]http://www.isti.cnr.it/ [23] http://www.isti.cnr.it/ <dsr> [$1\47] [24]http://www.w3.org/2010/02/mbui/cfp.html [24] http://www.w3.org/2010/02/mbui/cfp.html ChrisL: Please continue discussing on the list. Doug's questions shepazu: Shouldn't this be made public? burn: Personally, I agree, I doubt there's anything that would be a problem, but we just need a quick check with the groups. ... I'm also going to be doing a project review with w3c staff. ... Absolutely we want this to get out, even though this presentation was definitely targeted at this group. shepazu: I think you need to be talking to DAP, they're covering cameras, and I assume they either are or should be using microphones as well. shepazu: I think this is tightly integrated with MMI, but with the momentum behind Web apps, I agree that you need to work with a larger number of groups. burn: That was our belief and what we want to do. <kaz> [adjourned] Summary of Action Items [End of minutes]
Received on Wednesday, 24 February 2010 13:38:53 UTC