[minutes] 16 June 2011 from Dan Burnett on 2011-06-28 (public-xg-htmlspeech@w3.org from June 2011)

From: Dan Burnett <dburnett@voxeo.com>
Date: Tue, 28 Jun 2011 18:13:44 -0400
To: "public-xg-htmlspeech@w3.org" <public-xg-htmlspeech@w3.org>
Message-Id: <A083F622-523A-42C6-A4FA-C3B62C1A27B7@voxeo.com>
Group,

The minutes from the last call are available at http://www.w3.org/2011/06/16-htmlspeech-minutes.html.

For convenience, a text version is embedded below.

Thanks to Patrick Ehlen for taking the minutes!

-- dan

**********************************************************************************

   [1]W3C

      [1] http://www.w3.org/

                               - DRAFT -

              HTML Speech Incubator Group Teleconference

16 Jun 2011

   [2]Agenda

      [2] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Jun/0033.html

   See also: [3]IRC log

      [3] http://www.w3.org/2011/06/16-htmlspeech-irc

Attendees

   Present
          Milan_Young, Michael_Johnston, Dan_Burnett, Michael_Bodell,
          Olli_Pettay, Dan_Druta, Charles_Hemphill, Patrick_Ehlen,
          Robert_Brown

   Regrets
          Raj_Tumuluri, Bjorn_Bringert

   Chair
          Dan_Burnett

   Scribe
          Patrick_Ehlen

Contents

     * [4]Topics
         1. [5]New design decisions?
         2. [6]markup binding
         3. [7]discussion time
         4. [8]do we need to support audio recording with recognition?
         5. [9]what are the built-ins, and what does that mean?
     * [10]Summary of Action Items
     _________________________________________________________



New design decisions?

   robert: is audio recording without recognition be supported?

   are there important scenarios for supporting recording without
   recognition

   <burn> satish, any update on markup binding?

markup binding

   <satish> burn: None, Bjorn was collecting input from the chrome team
   and since he has gone on leave I have no contact on what the status
   was.

   <burn> satish, can you please check? we are not waiting on the
   answer, but it would be nice to have the input

   robert: google issue on whether there should be a button to press

   <satish> burn: yes, I can take an action to get a definitive answer
   in the next few days.

   burn: satish will take this on w/ the chrome team

discussion time

do we need to support audio recording with recognition?

   burn: an advantage could be endpointing.
   ... is that an important criteria in this case as well?

   charles: another question is how real-time is the reco response?
   ... a recording may result in reco later
   ... an identifier might later associate the recording with a reco
   transcription

   burn: brings up question of whether we support reco on recorded
   audio

   robert: garbage models could be used to make recording in edge cases
   ... "overloading" recognition
   ... or will recording be a more common task
   ... Do we think recording with endpointing is important?

   milan: channel adaptation, sharing headers in same structure,
   parameters could be reused; sharing the same network paths --
   convenient to use same

   Charles: Also, the on-line vs. off-line cases

   milan: would most recording be associated with an attempt to
   understand the text in the recording?

   burn: Most significant feature is the endpointing

   milan: in that case, why not just use dict model, do reco, and save
   the waveform as backup?
   ... and how common would that be. If not so common, could use a
   garbage model (even a "first-class" one)

   burn: seems strange to call recording a weird special case of reco
   ... in favor of using the recording resource as described in mrcp

   robert: though endpointing may be valuable, would we support a
   "record" object in the API? how would this go all the way to the
   developer?

   burn: does not seem to be in our scope

   olli: there are other proposals that would handle recording

   charles: channel adaptation

   burn: channel normalization is not a valid reason for recording
   support

   charles: should probably also include built-in record grammar

   (milan above)

   milan: use case: may want to to do dictation in parallel with c&c
   ... e.g., provide a c&c followed immediately by dictation

   burn: but does that really belong as a built-in type in a grammar?
   ... sounds like there is not real consensus today vis-a-vis
   supporting a recording capability

   robert: have not heard a compelling reason to support recording

   burn: consensus not to do it now

   milan: would like a standard way to do it, should the need arise

   burn: we could state that we reserve this for the future

   milan: there should be some consistent and portable way to do this
   across engines

   robert: could be done as a proprietary extension

   milan: at least provide a consistent hack, like builtin:record

   robert: that's what the garbage model recording would be

   milan: that's fine, as long as all engines support this type of
   garbage model

   burn: to summarize, can't agree on specific recording scenarios

   (robert above)

   scribe: should agree on supporting garbage-recording scenario

   burn: as a group, agree not to define an explicit recording
   capability at this time.
   ... can be supported using a garbage model, or capabilities defined
   outside this group

what are the built-ins, and what does that mean?

   milan: existing builtins: dictation, search, address, numbers

   robert: already agreed there should be a certain set of predefined
   grammars
   ... so how do we refer to those?

   burn: 2 things make builtins interesting: (1) parameterization; (2)
   no language is required

   milan: markup already has certain defined types, parameters, etc, as
   native to HTML5. Would make sense to pay attention to that here

   burn: an unconstrained text box should naturally bind to a dictation
   model

   milan: should we remap the names of the builtins?

   burn: argue strongly for using html as a starting point

   robert: These should be builtins, not re-used vxml grammars

   <smaug> could someone paste a link to voicexml's builtin grammars ?

   charles: they've become a de facto standard; not supporting them is
   awkward

   <Robert> these are the HTML input types:
   [11]http://www.w3.org/TR/html5/the-input-element.html#attr-input-typ
   e

     [11] http://www.w3.org/TR/html5/the-input-element.html#attr-input-type

   burn: if someone wants to support legacy builtins in a way that
   doesn't break existing builtins, that's not a problem

   <Robert> perhaps have builtins that match these

   charles: there needs to be some way to include these

   (milan above)

   scribe: is there something about this that can't be represented by a
   query string?

   michael: do you want to reference, for example, an html number type,
   or some arbitrary number?

   milan: easier to use old builtins & augment them

   charles: need to look at greater good of using html vs vxml

   <mbodell> Widely implemented? See
   [12]http://en.wikipedia.org/wiki/URI_scheme

     [12] http://en.wikipedia.org/wiki/URI_scheme

   burn: michael, how would you reference grammars that are assoc. with
   html input types?

   michael: an html ruleref, with various attributes; or don't specify
   URI and ref them by markeup AP...
   ... most important is associating grammars with individual input
   elements
   ... not a strong use case to have URIs for these things, or ability
   for user to write their own that reference these

   burn: when people want to hack something up quickly, common input
   types should lend themselves to being included as part of a larger
   utterance

   michael: may be other ways to specify input for that type of
   scenarios

   burn: maybe reference not the grammar but the input type itself

   charles: similar input types not always require the same grammar

   burn: but the app author may want a way to link these different
   types of builtin grammars together

   milan: perhaps just do the proposal

   burn: who on the call is interested in builtin models?

   charles: interested in it; this group seems focused on web search
   and dictation, as opposed to broader html cases

   <mbodell> <input type="search" name="q" speech required
   onspeechchange="startSearch">

   michael: there will probably be a standard set of grammar libraries,
   though perhaps the market will provide those

   johnston: can't see us requiring something like a "zip code" lib,
   for internationalization reasons

   michael: HTML has already handled a lot of these issues

   (milan above)

   (michael, above, actually)

   milan: should there be an html binding?

   michael: would be better if you could speech enable certain input
   types with little work

   robert: if no builtins were specified, what are the consequences?

   burn: if you want broad adoptability and usage, it needs to be as
   easy to create simple apps as vxml

   robert: we need it to do the html binding.
   ... so how much do we need the html binding part?

   milan: definitely need the capability to specify search, dictation,
   etc.

   robert: that's different from looking at html input types, etc.
   that's a complex problem

   milan: would like to have a notion of how to solve binding problem
   before we do dictation

   robert: does anyone have a proposal to volunteer?

   milan: perhaps can do it after I get the dictation stuff out

   micheal: there is a topic in the API about markup bindings.

   burn: true that it's a binding issue
   ... without a proposal, it doesn't happen.
   ... so it will be up to someone to write a proposal

   milan: perhaps sending a message to google on this

   robert: or to satish

   burn: action item for milan to talk with satish and ask for help on
   structuring a proposal
   ... reminder: no call next week

   robert: but there will be a protocol meeting
Received on Tuesday, 28 June 2011 22:10:58 UTC