[minutes] 28 April 2011 from Dan Burnett on 2011-04-29 (public-xg-htmlspeech@w3.org from April 2011)

From: Dan Burnett <dburnett@voxeo.com>
Date: Fri, 29 Apr 2011 08:45:20 -0400
To: public-xg-htmlspeech@w3.org
Message-Id: <1257FE80-5785-45A5-ACFB-2F72DF5E78F5@voxeo.com>
The minutes are available at http://www.w3.org/2011/04/28-htmlspeech-minutes.html 
.

For convenience, a text version follows.  Thanks to Robert Brown for  
taking minutes!

-- dan


******************************************************************************
    [2]Agenda

       [2] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Apr/0059.html

    See also: [3]IRC log

       [3] http://www.w3.org/2011/04/28-htmlspeech-irc

Attendees

    Present
           Dan_Burnett, Olli_Pettay, Robert_Brown, Charles_Hemphill,
           Milan_Young, Debbie_Dahl, +1.818.237.aaaa, Bjorn_Bringert,
           Michael_Johnston, Raj_Tumuluri, Patrick_Ehlen, Michael_Bodell

    Regrets
           Dan Druta

    Chair
           Dan Burnett

    Scribe
           Robert Brown

Contents

      * [4]Topics
          1. [5]F2F logistics
          2. [6]updated report draft
          3. [7]new design decisions
      * [8]Summary of Action Items
      _________________________________________________________

    <burn> trackbot, start telcon

    <trackbot> Date: 28 April 2011

    <burn> Scribe: Robert Brown

    <burn> ScribeNick: Robert

    <burn> Agenda:
    [9]http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Apr/
    0059.html

       [9] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Apr/0059.html

F2F logistics

    Bjorn: nothing new logistically

    Burn: will send revised schedule

updated report draft

    <burn> final report draft:
    [10]http://www.w3.org/2005/Incubator/htmlspeech/live/NOTE-htmlspeech
    -20110426.html

      [10] http://www.w3.org/2005/Incubator/htmlspeech/live/NOTE-htmlspeech-20110426.html

    burn: no new comments

new design decisions

    Bjorn: previously only looked at intersection of proposals, is there
    anything that's in two proposals but not the third. e.g. continuous
    recognition

    Milan: any requirement that we support this?

    burn: will add continuous recognition to the list of topics to
    discuss

    Bjorn: only removed it from Google proposal because difficult to do
    , and may want to do it in a later version

    Michael: recapped two scenarios stated by Bjorn: 1) continuous
    speech; 2) open mic

    Bjorn: proposed that we all agree this is a requirement

    Milan: we were vague about what the interim events requirement
    meant, whether it included results

    <bringert> burn: satish is trying to join, but zakim says the
    conference code isn't valid

    Burn: [after discussion] proposes Michael adds this as a new
    requirement (or requirements) to the report

    Michael: sure, but will also check to see whether we just need to
    clarify an existing requirement

    Bjorn: this is also a design topic

    <satish> burn: will do

    Bjorn: Robert is there anything else in the Microsoft proposal that
    should be considered as a design decision?

    Robert: nothing apparent, will review again in coming week

    Bjorn: should we start work on a joint proposal then?

    Burn: proposes that we now go to the list of issues to discuss and
    discuss them

    Bjorn: more items for discussion from Microsoft proposal
    ... MS proposal supports multiple grammars, but Google & Mozilla
    only supports one

    Olli: Mozilla proposal allows multiple parallel recognitions, each
    with its own grammar

    MichaelJohnston: can't reference an SLM from SRGS, so multiple
    grammars are required

    Bjorn: proposes topic: Should we support multiple simultaneous
    grammars?
    ... proposes topic: which timeout parameters should we have?

    <smaug_> yeah, Mozilla proposal should have some timouts

    <smaug_> timeouts

    Bjorn: emulating speech input is a requirement, but it's only
    present in the Microsoft proposal

    Michael: proposes topic: some way for the application to provide
    feedback information to the recognizer

    Bjorn: does anybody disagree that this is a requirement we agree on?

    Burn: proposes requirement: "it must be possible for the application
    author to provide feedback on the recognition result"

    Debbie: need to discuss the result format

    Michael: seems like general agreement on EMMA, with notion of other
    formats available

    Olli: EMMA as a DOM document? Or as a JSON object?

    MichaelJohnston: multimodal working group has been discussing JSON
    representations of EMMA
    ... there are some issues, such as losing element/attribute
    distinction
    ... straight translation to JSON is a little ugly

    Michael: existing proposals include simple representations as
    alternatives to EMMA

    MichaelJohnston: For more nuanced things, let's not reinvent
    solutions to the problems EMMA already solves

    Milan: would rather not have EMMA mean XML, since that implies the
    app needs a parser

    Debbie: sounds like we agree on EMMA, but need to discuss how its
    represented, simplified formats, etc

    Milan: a good idea to agree that an EMMA result available through a
    DOM object is a baseline agreement

    Bjorn: it's okay to provide the EMMA DOM, but we should also have
    the simple access mechanism that all three proposals have

    Burn: would rather have XML or JSON, but not the DOM

    Michael: if you have XML, you can feed it into the DOM

    Burn: it's a minor objection, if everybody else agrees on the DOM,
    I'm okay with that

    Bjorn: maybe just provide both

    MichaelJohnston: EMMA will also help with more sophisticated
    multimodal apps, for example using ink. The DOM will be more
    convenient to work with.

    Burn: proposed agreement: "both DOM and XML text representations of
    EMMA must be provided"
    ... haven't necessarily agreed that that is all

    Bjorn: we already appear to agree, based on proposals: "recognition
    results must also be available in the javascript objects where the
    result is a list of recognition result items containing utterance,
    confidence and interpretation."

    Michael: may need to be tweaked to accommodate continuous
    recognition

    Burn: add "at least" to Bjorn's proposed requirement
    ... added a statement "note that this will need to be adjusted based
    on any decision regarding support for continuous recognition"

    Milan: would like to add a discussion topic around generic
    parameters to the recognition engine

    Burn: related to existing topic on the list, but will add

    Milan: also need to agree on standard parameters, such as
    speed-vs-accuracy

    Burn: will generalize the timeouts discussion to include other
    parameters

    MichaelJohnston: which parameters should be expressed in the
    javascript API, and what can go in the URI? What sorts of conflicts
    could occur?

    Bjorn: URI parameters are engine specific

    MichaelJohnston: for example, if we agreed that the way standard
    parameters are communicated is via the URI, they could come from the
    URI, or from the Javascript

    Michael: need to discuss the API/protocol to the speech engine, and
    how standard parameters are conveyed

    Bjorn: we need to discuss the protocol, it's not in the list

    Burn: will add it to the list

    Milan: are the grammars referred to by HTTP URI?

    Burn: existing requirement says "uri" which was intended to
    represent URLs and URNs

    Milan: would like to mandate that HTTP was for sure supported. there
    are lots of others that may work.

    Robert: should we have a standard set of built-in grammars/topics?

    Bjorn: in the Google proposal we had "builtin:" URIs

    Burn: "a standard set of common tasks/grammars should be supported.
    details TBD"
    ... need a discussion topic about what these are

    Robert: what about inline grammars?

    Bjorn: data URIs would work for that, and perhaps we should agree
    about that

    Charles: would like to see inline grammars remain on the table

    Burn: will add a discussion about inline grammars
    ... we all agree on the functionality that inline grammars would
    give

    MichaelJohnston: one target user is "mom & pop developers" who would
    provide simple grammars

    Burn: discussion topic: "what is the mechanism for authors to
    directly include grammars within their HTML document? Is this inline
    XML, data URI or something else?"

    Robert: use case: given that HTML5 supports local storage, the data
    from which a grammar is constructed may only be located on the local
    device

    Bjorn: proposes that we mandate data URIs, just for consistency with
    the rest of HTML

    Burn: no objections, so will record as an agreement

    Michael: need to discuss the ability to do re-recognition

    Burn: related to the topic of recognition from a file

    Bjorn: both are fine discussion topics

    Burn: [discussion about whether there's anything to discuss around
    endpointing], already implied in existing discussion topic

    Bjorn: context block?

    Burn: discussion topic: "do we need a recognition context block
    capability?" and if we end up deciding yes, we'll discuss the
    mechanism

    Milan: how do we specify a default recognizer?

    Bjorn: don't specify it at all
    ... since it's the default

    Michael: need some canonical string to specify user agent default,
    so we could switch back to it (could be empty string)
    ... Whereas how we specify a local one may be similar to the way to
    specify the remote engine

    Bjorn: for local engines do we need to specify the engine or the
    criteria?

    Burn: SSML does it this way

    Bjorn: is there a use case for specifying criteria?

    Burn: in Tropo API, language specification can specify a specific
    engine
    ... this is a scoping issue. e.g. in SSML a voice is used in the
    scope of the enclosing element
    ... in HTML could say that the scope is the input field, or the
    entire form

    Bjorn: in all the proposals, scoping is to a javascript object
    ... are there any other criteria for local recognizers than
    speed-vs-accuracy?

    Charles: different microphones will have different profiles

    Raj: how do we discover characteristics of installed engines

    Michael: selection = discovery?

    Burn: in SSML, some people wanted discovery

    Bjorn: use cases?

    Michael: selection of existing acoustic and language models

    Robert: there's a blurry line between what a recognizer is, and what
    a parameter is

    Michael: topic: "how to specify default recognition"
    ... topic: "how to specify local recognizers"
    ... topic: "do we need to specify engines by capability?"

    Raj: or "how do we specify the parameters to the local recognizer?"

    Burn: want to back up to "what is a recognizer, and what parameters
    does it need?"
    ... call something a recognizer, and call other things related to
    that a recognizer

    Bjorn: the API probably doesn't need to specify a recognizer. speech
    and parameters go somewhere and results come back

    Burn: what is the boundary between selecting a recognizer and
    selecting the parameters of a recognizer

    Milan: we need to discuss audio streaming

    Burn: topic: "do we support audio streaming and how?"

    <Milan> Milan: Let's discuss audio streaming
Received on Friday, 29 April 2011 12:45:49 UTC