- From: Dan Burnett <dburnett@voxeo.com>
- Date: Fri, 29 Apr 2011 08:45:20 -0400
- To: public-xg-htmlspeech@w3.org
The minutes are available at http://www.w3.org/2011/04/28-htmlspeech-minutes.html
.
For convenience, a text version follows. Thanks to Robert Brown for
taking minutes!
-- dan
******************************************************************************
[2]Agenda
[2] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Apr/0059.html
See also: [3]IRC log
[3] http://www.w3.org/2011/04/28-htmlspeech-irc
Attendees
Present
Dan_Burnett, Olli_Pettay, Robert_Brown, Charles_Hemphill,
Milan_Young, Debbie_Dahl, +1.818.237.aaaa, Bjorn_Bringert,
Michael_Johnston, Raj_Tumuluri, Patrick_Ehlen, Michael_Bodell
Regrets
Dan Druta
Chair
Dan Burnett
Scribe
Robert Brown
Contents
* [4]Topics
1. [5]F2F logistics
2. [6]updated report draft
3. [7]new design decisions
* [8]Summary of Action Items
_________________________________________________________
<burn> trackbot, start telcon
<trackbot> Date: 28 April 2011
<burn> Scribe: Robert Brown
<burn> ScribeNick: Robert
<burn> Agenda:
[9]http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Apr/
0059.html
[9] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Apr/0059.html
F2F logistics
Bjorn: nothing new logistically
Burn: will send revised schedule
updated report draft
<burn> final report draft:
[10]http://www.w3.org/2005/Incubator/htmlspeech/live/NOTE-htmlspeech
-20110426.html
[10] http://www.w3.org/2005/Incubator/htmlspeech/live/NOTE-htmlspeech-20110426.html
burn: no new comments
new design decisions
Bjorn: previously only looked at intersection of proposals, is there
anything that's in two proposals but not the third. e.g. continuous
recognition
Milan: any requirement that we support this?
burn: will add continuous recognition to the list of topics to
discuss
Bjorn: only removed it from Google proposal because difficult to do
, and may want to do it in a later version
Michael: recapped two scenarios stated by Bjorn: 1) continuous
speech; 2) open mic
Bjorn: proposed that we all agree this is a requirement
Milan: we were vague about what the interim events requirement
meant, whether it included results
<bringert> burn: satish is trying to join, but zakim says the
conference code isn't valid
Burn: [after discussion] proposes Michael adds this as a new
requirement (or requirements) to the report
Michael: sure, but will also check to see whether we just need to
clarify an existing requirement
Bjorn: this is also a design topic
<satish> burn: will do
Bjorn: Robert is there anything else in the Microsoft proposal that
should be considered as a design decision?
Robert: nothing apparent, will review again in coming week
Bjorn: should we start work on a joint proposal then?
Burn: proposes that we now go to the list of issues to discuss and
discuss them
Bjorn: more items for discussion from Microsoft proposal
... MS proposal supports multiple grammars, but Google & Mozilla
only supports one
Olli: Mozilla proposal allows multiple parallel recognitions, each
with its own grammar
MichaelJohnston: can't reference an SLM from SRGS, so multiple
grammars are required
Bjorn: proposes topic: Should we support multiple simultaneous
grammars?
... proposes topic: which timeout parameters should we have?
<smaug_> yeah, Mozilla proposal should have some timouts
<smaug_> timeouts
Bjorn: emulating speech input is a requirement, but it's only
present in the Microsoft proposal
Michael: proposes topic: some way for the application to provide
feedback information to the recognizer
Bjorn: does anybody disagree that this is a requirement we agree on?
Burn: proposes requirement: "it must be possible for the application
author to provide feedback on the recognition result"
Debbie: need to discuss the result format
Michael: seems like general agreement on EMMA, with notion of other
formats available
Olli: EMMA as a DOM document? Or as a JSON object?
MichaelJohnston: multimodal working group has been discussing JSON
representations of EMMA
... there are some issues, such as losing element/attribute
distinction
... straight translation to JSON is a little ugly
Michael: existing proposals include simple representations as
alternatives to EMMA
MichaelJohnston: For more nuanced things, let's not reinvent
solutions to the problems EMMA already solves
Milan: would rather not have EMMA mean XML, since that implies the
app needs a parser
Debbie: sounds like we agree on EMMA, but need to discuss how its
represented, simplified formats, etc
Milan: a good idea to agree that an EMMA result available through a
DOM object is a baseline agreement
Bjorn: it's okay to provide the EMMA DOM, but we should also have
the simple access mechanism that all three proposals have
Burn: would rather have XML or JSON, but not the DOM
Michael: if you have XML, you can feed it into the DOM
Burn: it's a minor objection, if everybody else agrees on the DOM,
I'm okay with that
Bjorn: maybe just provide both
MichaelJohnston: EMMA will also help with more sophisticated
multimodal apps, for example using ink. The DOM will be more
convenient to work with.
Burn: proposed agreement: "both DOM and XML text representations of
EMMA must be provided"
... haven't necessarily agreed that that is all
Bjorn: we already appear to agree, based on proposals: "recognition
results must also be available in the javascript objects where the
result is a list of recognition result items containing utterance,
confidence and interpretation."
Michael: may need to be tweaked to accommodate continuous
recognition
Burn: add "at least" to Bjorn's proposed requirement
... added a statement "note that this will need to be adjusted based
on any decision regarding support for continuous recognition"
Milan: would like to add a discussion topic around generic
parameters to the recognition engine
Burn: related to existing topic on the list, but will add
Milan: also need to agree on standard parameters, such as
speed-vs-accuracy
Burn: will generalize the timeouts discussion to include other
parameters
MichaelJohnston: which parameters should be expressed in the
javascript API, and what can go in the URI? What sorts of conflicts
could occur?
Bjorn: URI parameters are engine specific
MichaelJohnston: for example, if we agreed that the way standard
parameters are communicated is via the URI, they could come from the
URI, or from the Javascript
Michael: need to discuss the API/protocol to the speech engine, and
how standard parameters are conveyed
Bjorn: we need to discuss the protocol, it's not in the list
Burn: will add it to the list
Milan: are the grammars referred to by HTTP URI?
Burn: existing requirement says "uri" which was intended to
represent URLs and URNs
Milan: would like to mandate that HTTP was for sure supported. there
are lots of others that may work.
Robert: should we have a standard set of built-in grammars/topics?
Bjorn: in the Google proposal we had "builtin:" URIs
Burn: "a standard set of common tasks/grammars should be supported.
details TBD"
... need a discussion topic about what these are
Robert: what about inline grammars?
Bjorn: data URIs would work for that, and perhaps we should agree
about that
Charles: would like to see inline grammars remain on the table
Burn: will add a discussion about inline grammars
... we all agree on the functionality that inline grammars would
give
MichaelJohnston: one target user is "mom & pop developers" who would
provide simple grammars
Burn: discussion topic: "what is the mechanism for authors to
directly include grammars within their HTML document? Is this inline
XML, data URI or something else?"
Robert: use case: given that HTML5 supports local storage, the data
from which a grammar is constructed may only be located on the local
device
Bjorn: proposes that we mandate data URIs, just for consistency with
the rest of HTML
Burn: no objections, so will record as an agreement
Michael: need to discuss the ability to do re-recognition
Burn: related to the topic of recognition from a file
Bjorn: both are fine discussion topics
Burn: [discussion about whether there's anything to discuss around
endpointing], already implied in existing discussion topic
Bjorn: context block?
Burn: discussion topic: "do we need a recognition context block
capability?" and if we end up deciding yes, we'll discuss the
mechanism
Milan: how do we specify a default recognizer?
Bjorn: don't specify it at all
... since it's the default
Michael: need some canonical string to specify user agent default,
so we could switch back to it (could be empty string)
... Whereas how we specify a local one may be similar to the way to
specify the remote engine
Bjorn: for local engines do we need to specify the engine or the
criteria?
Burn: SSML does it this way
Bjorn: is there a use case for specifying criteria?
Burn: in Tropo API, language specification can specify a specific
engine
... this is a scoping issue. e.g. in SSML a voice is used in the
scope of the enclosing element
... in HTML could say that the scope is the input field, or the
entire form
Bjorn: in all the proposals, scoping is to a javascript object
... are there any other criteria for local recognizers than
speed-vs-accuracy?
Charles: different microphones will have different profiles
Raj: how do we discover characteristics of installed engines
Michael: selection = discovery?
Burn: in SSML, some people wanted discovery
Bjorn: use cases?
Michael: selection of existing acoustic and language models
Robert: there's a blurry line between what a recognizer is, and what
a parameter is
Michael: topic: "how to specify default recognition"
... topic: "how to specify local recognizers"
... topic: "do we need to specify engines by capability?"
Raj: or "how do we specify the parameters to the local recognizer?"
Burn: want to back up to "what is a recognizer, and what parameters
does it need?"
... call something a recognizer, and call other things related to
that a recognizer
Bjorn: the API probably doesn't need to specify a recognizer. speech
and parameters go somewhere and results come back
Burn: what is the boundary between selecting a recognizer and
selecting the parameters of a recognizer
Milan: we need to discuss audio streaming
Burn: topic: "do we support audio streaming and how?"
<Milan> Milan: Let's discuss audio streaming
Received on Friday, 29 April 2011 12:45:49 UTC