- From: Dan Burnett <dburnett@voxeo.com>
- Date: Fri, 29 Apr 2011 08:45:20 -0400
- To: public-xg-htmlspeech@w3.org
The minutes are available at http://www.w3.org/2011/04/28-htmlspeech-minutes.html . For convenience, a text version follows. Thanks to Robert Brown for taking minutes! -- dan ****************************************************************************** [2]Agenda [2] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Apr/0059.html See also: [3]IRC log [3] http://www.w3.org/2011/04/28-htmlspeech-irc Attendees Present Dan_Burnett, Olli_Pettay, Robert_Brown, Charles_Hemphill, Milan_Young, Debbie_Dahl, +1.818.237.aaaa, Bjorn_Bringert, Michael_Johnston, Raj_Tumuluri, Patrick_Ehlen, Michael_Bodell Regrets Dan Druta Chair Dan Burnett Scribe Robert Brown Contents * [4]Topics 1. [5]F2F logistics 2. [6]updated report draft 3. [7]new design decisions * [8]Summary of Action Items _________________________________________________________ <burn> trackbot, start telcon <trackbot> Date: 28 April 2011 <burn> Scribe: Robert Brown <burn> ScribeNick: Robert <burn> Agenda: [9]http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Apr/ 0059.html [9] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Apr/0059.html F2F logistics Bjorn: nothing new logistically Burn: will send revised schedule updated report draft <burn> final report draft: [10]http://www.w3.org/2005/Incubator/htmlspeech/live/NOTE-htmlspeech -20110426.html [10] http://www.w3.org/2005/Incubator/htmlspeech/live/NOTE-htmlspeech-20110426.html burn: no new comments new design decisions Bjorn: previously only looked at intersection of proposals, is there anything that's in two proposals but not the third. e.g. continuous recognition Milan: any requirement that we support this? burn: will add continuous recognition to the list of topics to discuss Bjorn: only removed it from Google proposal because difficult to do , and may want to do it in a later version Michael: recapped two scenarios stated by Bjorn: 1) continuous speech; 2) open mic Bjorn: proposed that we all agree this is a requirement Milan: we were vague about what the interim events requirement meant, whether it included results <bringert> burn: satish is trying to join, but zakim says the conference code isn't valid Burn: [after discussion] proposes Michael adds this as a new requirement (or requirements) to the report Michael: sure, but will also check to see whether we just need to clarify an existing requirement Bjorn: this is also a design topic <satish> burn: will do Bjorn: Robert is there anything else in the Microsoft proposal that should be considered as a design decision? Robert: nothing apparent, will review again in coming week Bjorn: should we start work on a joint proposal then? Burn: proposes that we now go to the list of issues to discuss and discuss them Bjorn: more items for discussion from Microsoft proposal ... MS proposal supports multiple grammars, but Google & Mozilla only supports one Olli: Mozilla proposal allows multiple parallel recognitions, each with its own grammar MichaelJohnston: can't reference an SLM from SRGS, so multiple grammars are required Bjorn: proposes topic: Should we support multiple simultaneous grammars? ... proposes topic: which timeout parameters should we have? <smaug_> yeah, Mozilla proposal should have some timouts <smaug_> timeouts Bjorn: emulating speech input is a requirement, but it's only present in the Microsoft proposal Michael: proposes topic: some way for the application to provide feedback information to the recognizer Bjorn: does anybody disagree that this is a requirement we agree on? Burn: proposes requirement: "it must be possible for the application author to provide feedback on the recognition result" Debbie: need to discuss the result format Michael: seems like general agreement on EMMA, with notion of other formats available Olli: EMMA as a DOM document? Or as a JSON object? MichaelJohnston: multimodal working group has been discussing JSON representations of EMMA ... there are some issues, such as losing element/attribute distinction ... straight translation to JSON is a little ugly Michael: existing proposals include simple representations as alternatives to EMMA MichaelJohnston: For more nuanced things, let's not reinvent solutions to the problems EMMA already solves Milan: would rather not have EMMA mean XML, since that implies the app needs a parser Debbie: sounds like we agree on EMMA, but need to discuss how its represented, simplified formats, etc Milan: a good idea to agree that an EMMA result available through a DOM object is a baseline agreement Bjorn: it's okay to provide the EMMA DOM, but we should also have the simple access mechanism that all three proposals have Burn: would rather have XML or JSON, but not the DOM Michael: if you have XML, you can feed it into the DOM Burn: it's a minor objection, if everybody else agrees on the DOM, I'm okay with that Bjorn: maybe just provide both MichaelJohnston: EMMA will also help with more sophisticated multimodal apps, for example using ink. The DOM will be more convenient to work with. Burn: proposed agreement: "both DOM and XML text representations of EMMA must be provided" ... haven't necessarily agreed that that is all Bjorn: we already appear to agree, based on proposals: "recognition results must also be available in the javascript objects where the result is a list of recognition result items containing utterance, confidence and interpretation." Michael: may need to be tweaked to accommodate continuous recognition Burn: add "at least" to Bjorn's proposed requirement ... added a statement "note that this will need to be adjusted based on any decision regarding support for continuous recognition" Milan: would like to add a discussion topic around generic parameters to the recognition engine Burn: related to existing topic on the list, but will add Milan: also need to agree on standard parameters, such as speed-vs-accuracy Burn: will generalize the timeouts discussion to include other parameters MichaelJohnston: which parameters should be expressed in the javascript API, and what can go in the URI? What sorts of conflicts could occur? Bjorn: URI parameters are engine specific MichaelJohnston: for example, if we agreed that the way standard parameters are communicated is via the URI, they could come from the URI, or from the Javascript Michael: need to discuss the API/protocol to the speech engine, and how standard parameters are conveyed Bjorn: we need to discuss the protocol, it's not in the list Burn: will add it to the list Milan: are the grammars referred to by HTTP URI? Burn: existing requirement says "uri" which was intended to represent URLs and URNs Milan: would like to mandate that HTTP was for sure supported. there are lots of others that may work. Robert: should we have a standard set of built-in grammars/topics? Bjorn: in the Google proposal we had "builtin:" URIs Burn: "a standard set of common tasks/grammars should be supported. details TBD" ... need a discussion topic about what these are Robert: what about inline grammars? Bjorn: data URIs would work for that, and perhaps we should agree about that Charles: would like to see inline grammars remain on the table Burn: will add a discussion about inline grammars ... we all agree on the functionality that inline grammars would give MichaelJohnston: one target user is "mom & pop developers" who would provide simple grammars Burn: discussion topic: "what is the mechanism for authors to directly include grammars within their HTML document? Is this inline XML, data URI or something else?" Robert: use case: given that HTML5 supports local storage, the data from which a grammar is constructed may only be located on the local device Bjorn: proposes that we mandate data URIs, just for consistency with the rest of HTML Burn: no objections, so will record as an agreement Michael: need to discuss the ability to do re-recognition Burn: related to the topic of recognition from a file Bjorn: both are fine discussion topics Burn: [discussion about whether there's anything to discuss around endpointing], already implied in existing discussion topic Bjorn: context block? Burn: discussion topic: "do we need a recognition context block capability?" and if we end up deciding yes, we'll discuss the mechanism Milan: how do we specify a default recognizer? Bjorn: don't specify it at all ... since it's the default Michael: need some canonical string to specify user agent default, so we could switch back to it (could be empty string) ... Whereas how we specify a local one may be similar to the way to specify the remote engine Bjorn: for local engines do we need to specify the engine or the criteria? Burn: SSML does it this way Bjorn: is there a use case for specifying criteria? Burn: in Tropo API, language specification can specify a specific engine ... this is a scoping issue. e.g. in SSML a voice is used in the scope of the enclosing element ... in HTML could say that the scope is the input field, or the entire form Bjorn: in all the proposals, scoping is to a javascript object ... are there any other criteria for local recognizers than speed-vs-accuracy? Charles: different microphones will have different profiles Raj: how do we discover characteristics of installed engines Michael: selection = discovery? Burn: in SSML, some people wanted discovery Bjorn: use cases? Michael: selection of existing acoustic and language models Robert: there's a blurry line between what a recognizer is, and what a parameter is Michael: topic: "how to specify default recognition" ... topic: "how to specify local recognizers" ... topic: "do we need to specify engines by capability?" Raj: or "how do we specify the parameters to the local recognizer?" Burn: want to back up to "what is a recognizer, and what parameters does it need?" ... call something a recognizer, and call other things related to that a recognizer Bjorn: the API probably doesn't need to specify a recognizer. speech and parameters go somewhere and results come back Burn: what is the boundary between selecting a recognizer and selecting the parameters of a recognizer Milan: we need to discuss audio streaming Burn: topic: "do we support audio streaming and how?" <Milan> Milan: Let's discuss audio streaming
Received on Friday, 29 April 2011 12:45:49 UTC