- From: Dan Burnett <dburnett@voxeo.com>
- Date: Thu, 30 Jun 2011 13:49:48 -0400
- To: public-xg-htmlspeech@w3.org
Group, The minutes from the last call are available at http://www.w3.org/2011/06/30-htmlspeech-minutes.html. For convenience, a text version is embedded below. Note that the majority of this call was actually a meeting of the WebAPI Subgroup. Thanks to Debbie Dahl for taking the minutes! -- dan ********************************************************************************** - DRAFT - HTML Speech Incubator Group Teleconference 30 Jun 2011 [2]Agenda [2] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Jun/0078.html See also: [3]IRC log [3] http://www.w3.org/2011/06/30-htmlspeech-irc Attendees Present Dan_Burnett, Patrick_Ehlen, Michael_Johnston, Olli_Pettay, Michael_Bodell, Dan_Druta, Debbie_Dahl, Charles_Hemphill, Glen_Shires, Bjorn_Bringert, Satish_Sampath Regrets Raj_Tumuluri Chair Dan_Burnett Scribe ddahl Contents * [4]Topics 1. [5]review updated final report draft 2. [6]approve proposed changes to report draft 3. [7]status report from the WebAPI subgroup 4. [8]WebAPI subgroup * [9]Summary of Action Items _________________________________________________________ review updated final report draft dan: email me if you have problems [10]http://www.w3.org/2005/Incubator/htmlspeech/live/NOTE-htmlspeech -20110629.html [10] http://www.w3.org/2005/Incubator/htmlspeech/live/NOTE-htmlspeech-20110629.html approve proposed changes to report draft dan: marc suggested wording changes to requirements, we should approve ... i don't agree with all of them, redundancy isn't a problem ... propose making changes based on our current understanding. let me know if you have concerns. status report from the WebAPI subgroup dan: we'll start with the status and bring up anything that should be discussed in the larger group ... fyi, will leave for half an hour half an hour in michael: will start discussing drafts dan: any general discussion? michael: not yet. ... Raj is doing summary of requirements and design decisions, we don't know if there will be directional changes. dan: is there any discussion from the rest of the group? WebAPI subgroup danD: the idea was that I can create an object that isn't necessarily the ASR or TTS object, and then I can bind to the service. ... the protocol will drive some of the parameters ... will send an update based on bjorn's comments bjorn: i'm fine with the functionality, but maybe we do need two objects danD: will try to blend proposal with bjorn's comments michael: do we agree or not on two vs. one interface? danD: I don't know at the time when i do the query what services will be provided, TTS, ASR, or both bjorn: does it make sense to have a service that can provide both? michael: we do have a discussion point on this danD: having an interface bridge won't hurt bjorn: my objection to having a single one is that it makes the interface more complicated ... i want to be able to handle the case where i have one or the other or both michael: other comments on Dan's interface? danD: this won't be a full-fledged API or module in itself, it's just initialization ... we should start building a table saying "these are the things I want to identify" bjorn: if i want to have support for ASR or TTS it's hard to see what the API is. what if they are two different services. you have to do a bunch of checking flags. olli: it depends on whether the parameters are the same for both cases. bjorn: you also do totally different things with different services. there would need to be some kind of generic interface michael: it would succeed or fail depending on what you asked it to do. bjorn: it's better to specify two objects than having one giant object <satish> (I got disconnected and will try calling in again) bjorn: it's a syntactic issue michael: it also depends on whether there are a lot of services that are one or another bjorn: what parameters do you need to specify? URI, language, non-standard things like non-standard grammar format. michael: other parameters? michaelJ: grammar? bjorn: this is querying for capabilities of the recognizer ... it would make sense for the grammar to be a parameter, for example if you had some specific grammars, like "support for a specific grammar like 'date'". michael: that could be for the moral equivalent of the builtins dan: we're touching on some issues that we've already decided on, so we shouldn't revisit decisions that we already made bjorn: standard queries would be grammar, language, and vendor-specific, so it doesn't matter too much if we have one API or two michael: you may want to give them to the recognizer, not get them back from the recognizer danD: we talked about not wanted to disclose what the application wanted to do. bjorn: should get a list of what grammars and languages the recognizer supports michael: it should accept a list of grammars and languages as it's criteria and you get an engine back ... should return failure if the service can't support all the languages, but in the case of languages you might want to know if the service supports a subset bjorn: someone could pass in a list of all the languages in the world olli: the user agent should be able to ask the user danD: if i just ask what languages you support, how is that a privacy issue? olli: if the service supports only Finnish and English, you could guess that i'm Finnish <bringert> I got disconnected michael: you could also use the API for the local device that always has the user's language on it. ... services don't have to necessarily be honest about their answers glenn: this seems like a major limitation that we're putting on developers for privacy reasons. bjorn: regardless, we should say "give me a service that supports XYZ", and it's ok for the service to say "no comment" michael: we want to allow the user to customize the service charles: web servers already get the locale olli: getting supported languages is just another data about the user bjorn: most common use case is ASR and TTS for locale, so how about if we just get the locale language olli: that might work danD: so far, we should be able to provide the filter criteria for the grammar and the language, it should be optional, will get another version, we can discuss further bjorn: we could say that the default locale language is supported, it's the additional languages that are supported that we have to think about danD: will start a table of other attributes that should be available at initialization ... and will get an update michael: now look at HTML bindings bjorn: would like there to be an element that can be standalone or enclosed in other elements ... not sure about control element ... the important things for me on the recognition element, it should be possible for the web app author to put it on a form olli: how do you actually bind the value? bjorn: the definition of a value for a form control is that it's always a string without formatting ... not so obvious for checkbox, it has to be defined for each type ... it's the kind of think you put in the "value" attribute for non-text elements ... for textarea or content editable it's the text olli: automatic binding in X+V was annoying michael: the difference is the optionality, you don't have to do it. as for the microphone, the reco image is platform-specific, microphone, button, etc. olli: the graphical presention could be problematic bjorn: each browser will have to decide what security model it wants to implement michael: not sure about usefullness of the form, but the "for" does seem useful bjorn: form is just a convenience <burn> hey, sounds like bjorn wants voicexml :) bjorn: should we look at label? ... the HTML label does what we want ... we want to do the same things that label does olli: when will user give permission? michael: each browser will be different ... some people want the button to appear on the screen without asking permission bjorn: Google Voice search, for example, you don't want to have to prompt the user every time olli: worried about when user will give permission bjorn: easier in the CaptureAPI case if there's no markup michael: you need to check for permission when you do the reco, not just to have a reco object olli: if the user never wants speech, maybe the browser doesn't even render the microphone bjorn: olli, are you still concerned about consistency of permission policy? olli: my concerns are that the user agent needs permission before using the reco object bjorn: is the CaptureAPI similar to the Javascript recognition API? olli: you get similar data in CaptureAPI and reco <smaug> [11]http://www.whatwg.org/specs/web-apps/current-work/multipage/dnd. html#video-conferencing-and-peer-to-peer-communication [11] http://www.whatwg.org/specs/web-apps/current-work/multipage/dnd.html#video-conferencing-and-peer-to-peer-communication bjorn: you can get a "permission denied" error code, that's very similar to our API michael: what doesn't work is that the permission check happens before the binding danD: there are two steps, one the rendering of the object, and then the user decides to use that UI element, and that's a privacy and consent issue ... it makes more sense if it doesn't even prompt the user until it knows something is there olli: a query to find out what kind of recognizer object is available is ok bjorn: do you see a problem with the HTML API having a different method? ... i think browsers should implement permission after the user clicks the button olli: what if user has already started speaking bjorn: no permission could either cancel or not start recognition michael: user should be able to revoke permission bjorn: these things are up to the user agent, having the Javascript API and the button should make it possible to implement appropriate privacy and security michael: move on, because other topics ... do we agree that we don't need HTML bindings for TTS? bjorn: don't have anything against it, but maybe a waste of time. michael: we can leave it as it is for now. let's start on bjorn's speech recognition events, similar to what i sent before the f2f scribe: added timestamps, there are also a number of error codes that we need to agree on ... what about nomatch and noinput, are they errors or kinds of input? michael: i think they're different types of result ... nomatch seems like a result, but noinput seems like a different kind of event dan: we look at rejections michael: if rejection was just below confidence you may want to look at that. charles: noinput could be like a volume issue michael: nospeech would not generate an nbest on our platform dan: for us it would be the same way glenn: why have multiple events instead of a single event that returns different parameters? michael: i don't think you're typically doing the same thing with noinput vs. nomatch charles: it's nice to have the engine decide if it's a nomatch dan: sometimes the engine ends up with no answer, the vast majority of nomatch is confidence-based glenn: should make sure that results returned are in as similar a format as possible bjorn: what about nospeech? dan: error to me means that something broke, not like a normal expected user situation bjorn: the distinction between error and normal is not always clear dan: true user interface behavior is not an error, "abort" would only be an error if you grouped together user-initiated abort and engine abort bjorn: are permission problems or network problems errors? michael: would not consider abort or noinput errors glenn: I would tie them all into the same event, that would be simpler for the developer michael: in the continuous case you don't care about noinput dan: we won't resolve this in the remaining time. michael: we can continue discussion on the list
Received on Thursday, 30 June 2011 17:50:35 UTC