[minutes] 30 June 2011 (incl. WebAPI subgroup)


The minutes from the last call are available at http://www.w3.org/2011/06/30-htmlspeech-minutes.html.

For convenience, a text version is embedded below.

Note that the majority of this call was actually a meeting of the WebAPI Subgroup.

Thanks to Debbie Dahl for taking the minutes!

-- dan


                              - DRAFT -

              HTML Speech Incubator Group Teleconference

30 Jun 2011


      [2] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Jun/0078.html

   See also: [3]IRC log

      [3] http://www.w3.org/2011/06/30-htmlspeech-irc


          Dan_Burnett, Patrick_Ehlen, Michael_Johnston, Olli_Pettay,
          Michael_Bodell, Dan_Druta, Debbie_Dahl, Charles_Hemphill,
          Glen_Shires, Bjorn_Bringert, Satish_Sampath





     * [4]Topics
         1. [5]review updated final report draft
         2. [6]approve proposed changes to report draft
         3. [7]status report from the WebAPI subgroup
         4. [8]WebAPI subgroup
     * [9]Summary of Action Items

review updated final report draft

   dan: email me if you have problems


     [10] http://www.w3.org/2005/Incubator/htmlspeech/live/NOTE-htmlspeech-20110629.html

approve proposed changes to report draft

   dan: marc suggested wording changes to requirements, we should
   ... i don't agree with all of them, redundancy isn't a problem
   ... propose making changes based on our current understanding. let
   me know if you have concerns.

status report from the WebAPI subgroup

   dan: we'll start with the status and bring up anything that should
   be discussed in the larger group
   ... fyi, will leave for half an hour half an hour in

   michael: will start discussing drafts

   dan: any general discussion?

   michael: not yet.
   ... Raj is doing summary of requirements and design decisions, we
   don't know if there will be directional changes.

   dan: is there any discussion from the rest of the group?

WebAPI subgroup

   danD: the idea was that I can create an object that isn't
   necessarily the ASR or TTS object, and then I can bind to the
   ... the protocol will drive some of the parameters
   ... will send an update based on bjorn's comments

   bjorn: i'm fine with the functionality, but maybe we do need two

   danD: will try to blend proposal with bjorn's comments

   michael: do we agree or not on two vs. one interface?

   danD: I don't know at the time when i do the query what services
   will be provided, TTS, ASR, or both

   bjorn: does it make sense to have a service that can provide both?

   michael: we do have a discussion point on this

   danD: having an interface bridge won't hurt

   bjorn: my objection to having a single one is that it makes the
   interface more complicated
   ... i want to be able to handle the case where i have one or the
   other or both

   michael: other comments on Dan's interface?

   danD: this won't be a full-fledged API or module in itself, it's
   just initialization
   ... we should start building a table saying "these are the things I
   want to identify"

   bjorn: if i want to have support for ASR or TTS it's hard to see
   what the API is. what if they are two different services. you have
   to do a bunch of checking flags.

   olli: it depends on whether the parameters are the same for both

   bjorn: you also do totally different things with different services.
   there would need to be some kind of generic interface

   michael: it would succeed or fail depending on what you asked it to

   bjorn: it's better to specify two objects than having one giant

   <satish> (I got disconnected and will try calling in again)

   bjorn: it's a syntactic issue

   michael: it also depends on whether there are a lot of services that
   are one or another

   bjorn: what parameters do you need to specify? URI, language,
   non-standard things like non-standard grammar format.

   michael: other parameters?

   michaelJ: grammar?

   bjorn: this is querying for capabilities of the recognizer
   ... it would make sense for the grammar to be a parameter, for
   example if you had some specific grammars, like "support for a
   specific grammar like 'date'".

   michael: that could be for the moral equivalent of the builtins

   dan: we're touching on some issues that we've already decided on, so
   we shouldn't revisit decisions that we already made

   bjorn: standard queries would be grammar, language, and
   vendor-specific, so it doesn't matter too much if we have one API or

   michael: you may want to give them to the recognizer, not get them
   back from the recognizer

   danD: we talked about not wanted to disclose what the application
   wanted to do.

   bjorn: should get a list of what grammars and languages the
   recognizer supports

   michael: it should accept a list of grammars and languages as it's
   criteria and you get an engine back
   ... should return failure if the service can't support all the
   languages, but in the case of languages you might want to know if
   the service supports a subset

   bjorn: someone could pass in a list of all the languages in the

   olli: the user agent should be able to ask the user

   danD: if i just ask what languages you support, how is that a
   privacy issue?

   olli: if the service supports only Finnish and English, you could
   guess that i'm Finnish

   <bringert> I got disconnected

   michael: you could also use the API for the local device that always
   has the user's language on it.
   ... services don't have to necessarily be honest about their answers

   glenn: this seems like a major limitation that we're putting on
   developers for privacy reasons.

   bjorn: regardless, we should say "give me a service that supports
   XYZ", and it's ok for the service to say "no comment"

   michael: we want to allow the user to customize the service

   charles: web servers already get the locale

   olli: getting supported languages is just another data about the

   bjorn: most common use case is ASR and TTS for locale, so how about
   if we just get the locale language

   olli: that might work

   danD: so far, we should be able to provide the filter criteria for
   the grammar and the language, it should be optional, will get
   another version, we can discuss further

   bjorn: we could say that the default locale language is supported,
   it's the additional languages that are supported that we have to
   think about

   danD: will start a table of other attributes that should be
   available at initialization
   ... and will get an update

   michael: now look at HTML bindings

   bjorn: would like there to be an element that can be standalone or
   enclosed in other elements
   ... not sure about control element
   ... the important things for me on the recognition element, it
   should be possible for the web app author to put it on a form

   olli: how do you actually bind the value?

   bjorn: the definition of a value for a form control is that it's
   always a string without formatting
   ... not so obvious for checkbox, it has to be defined for each type
   ... it's the kind of think you put in the "value" attribute for
   non-text elements
   ... for textarea or content editable it's the text

   olli: automatic binding in X+V was annoying

   michael: the difference is the optionality, you don't have to do it.
   as for the microphone, the reco image is platform-specific,
   microphone, button, etc.

   olli: the graphical presention could be problematic

   bjorn: each browser will have to decide what security model it wants
   to implement

   michael: not sure about usefullness of the form, but the "for" does
   seem useful

   bjorn: form is just a convenience

   <burn> hey, sounds like bjorn wants voicexml :)

   bjorn: should we look at label?
   ... the HTML label does what we want
   ... we want to do the same things that label does

   olli: when will user give permission?

   michael: each browser will be different
   ... some people want the button to appear on the screen without
   asking permission

   bjorn: Google Voice search, for example, you don't want to have to
   prompt the user every time

   olli: worried about when user will give permission

   bjorn: easier in the CaptureAPI case if there's no markup

   michael: you need to check for permission when you do the reco, not
   just to have a reco object

   olli: if the user never wants speech, maybe the browser doesn't even
   render the microphone

   bjorn: olli, are you still concerned about consistency of permission

   olli: my concerns are that the user agent needs permission before
   using the reco object

   bjorn: is the CaptureAPI similar to the Javascript recognition API?

   olli: you get similar data in CaptureAPI and reco


     [11] http://www.whatwg.org/specs/web-apps/current-work/multipage/dnd.html#video-conferencing-and-peer-to-peer-communication

   bjorn: you can get a "permission denied" error code, that's very
   similar to our API

   michael: what doesn't work is that the permission check happens
   before the binding

   danD: there are two steps, one the rendering of the object, and then
   the user decides to use that UI element, and that's a privacy and
   consent issue
   ... it makes more sense if it doesn't even prompt the user until it
   knows something is there

   olli: a query to find out what kind of recognizer object is
   available is ok

   bjorn: do you see a problem with the HTML API having a different
   ... i think browsers should implement permission after the user
   clicks the button

   olli: what if user has already started speaking

   bjorn: no permission could either cancel or not start recognition

   michael: user should be able to revoke permission

   bjorn: these things are up to the user agent, having the Javascript
   API and the button should make it possible to implement appropriate
   privacy and security

   michael: move on, because other topics
   ... do we agree that we don't need HTML bindings for TTS?

   bjorn: don't have anything against it, but maybe a waste of time.

   michael: we can leave it as it is for now.

   let's start on bjorn's speech recognition events, similar to what i
   sent before the f2f

   scribe: added timestamps, there are also a number of error codes
   that we need to agree on
   ... what about nomatch and noinput, are they errors or kinds of

   michael: i think they're different types of result
   ... nomatch seems like a result, but noinput seems like a different
   kind of event

   dan: we look at rejections

   michael: if rejection was just below confidence you may want to look
   at that.

   charles: noinput could be like a volume issue

   michael: nospeech would not generate an nbest on our platform

   dan: for us it would be the same way

   glenn: why have multiple events instead of a single event that
   returns different parameters?

   michael: i don't think you're typically doing the same thing with
   noinput vs. nomatch

   charles: it's nice to have the engine decide if it's a nomatch

   dan: sometimes the engine ends up with no answer, the vast majority
   of nomatch is confidence-based

   glenn: should make sure that results returned are in as similar a
   format as possible

   bjorn: what about nospeech?

   dan: error to me means that something broke, not like a normal
   expected user situation

   bjorn: the distinction between error and normal is not always clear

   dan: true user interface behavior is not an error, "abort" would
   only be an error if you grouped together user-initiated abort and
   engine abort

   bjorn: are permission problems or network problems errors?

   michael: would not consider abort or noinput errors

   glenn: I would tie them all into the same event, that would be
   simpler for the developer

   michael: in the continuous case you don't care about noinput

   dan: we won't resolve this in the remaining time.

   michael: we can continue discussion on the list

Received on Thursday, 30 June 2011 17:50:35 UTC