- From: Dan Burnett <dburnett@voxeo.com>
- Date: Fri, 4 Nov 2011 14:13:28 -0400
- To: public-xg-htmlspeech@w3.org
Group, The minutes from our first day are at http://www.w3.org/2011/11/03-htmlspeech-minutes.html. For convenience, I have pasted a text version below. -- dan ***************************** HTML Speech Incubator Group Teleconference 03 Nov 2011 See also: [2]IRC log [2] http://www.w3.org/2011/11/03-htmlspeech-irc Attendees Present DanB, Michael, Glen, Matt, Robert, Patrick, Avery, Nagesh, Debbie, Bertha, Milan, Rahul, DanD Regrets Chair Daniel_Burnett,Michael_Bodell Scribe ddahl_, ddahl Contents * [3]Topics 1. [4]Review recently sent examples 2. [5]Robert's example 3. [6]speech-enabled email 4. [7]Milan's example of protocol 5. [8]michael johnston's multimodal use case 6. [9]Charles Hemphill's example 7. [10]Michael Bodell's example 8, translation 8. [11]Debbie's example 9. [12]another example from Charles Hemphill 10. [13]issues 11. [14]Protocol Issues 12. [15]Web API Issues 13. [16]Issue 6 14. [17]Issue 7 15. [18]Issue 8 16. [19]Issue 9 17. [20]Issue 10 18. [21]Issue 11 19. [22]Issue 12 20. [23]Issue 13 21. [24]Issue 14 22. [25]Issue 15 23. [26]Issue 16 24. [27]Issue 17 25. [28]Issue 18 26. [29]Issue 19 27. [30]Issue 20 28. [31]Issue 21 29. [32]Issue 22 30. [33]Issue 23 * [34]Summary of Action Items _________________________________________________________ <smaug> hi <smaug> well, who am I then o_O <smaug> pong <burn> trackbot, start telcon <trackbot> Date: 03 November 2011 <Milan> ScribeNick: Milan Review recently sent examples <DanD> [35]http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Oct /att-0064/speechwepapi_1_.html#introduction [35] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Oct/att-0064/speechwepapi_1_.html#introduction <mbodell> [36]http://bantha.org/~mbodell/speechxg/example1.html [36] http://bantha.org/~mbodell/speechxg/example1.html Michael: Speech Web Search Markup only Robert: Found addGrammarFrom() is awkward ... really a hint Glen: True that input has no grammar Michael: It's a builtin grammar Robert: What about derviveGrammarFrom Glen: It's an append grammar DanD: Option might be a better example Michael: Text is a grammar Robert: Assume q is an object from which a grammar can be derived <smaug> Nit, <button name="mic" onclick="speechClick()"> is a submit button, so when you click it, the form is submitted. type="button" would fix the problem DanB: addDerivedGrammar Debbie: Figgure out semantics first Robert: AddDerivedGrammarFromID Glen: Also rename q to 'inputField' ... Also from text input type to date or somethign more contrained ... Need to specify the lack of grammars ... Is this dictation? Robert: improve example by defaulting to UTF-8 <glen> Section 5.1: when no grammar specified, defaults to builtin:dictation Robert: Base 64 encoding is ugly ... to the point where it is unsualbe Michael: Worried about directly inserting XML due to 8th bit DanB: Are there already common protocols for inserting strings derived from URLs into local variables? Glen: Should only be a W3C standard, implmentation is orthoginal Robert: AddFromString() would be nice:? Glen: addStringGrammar() and addElementGrammar() Avery: Perfer longer name because its truer to form <smaug> Couldn't you just prepend "data:application/srgs+xml," to the serialized XML. But anyway, using data urls is kind of hackish, IMO. Robert: Too many dots to get the interpretation Milan: Propose addGramamrFromURI() Robert: Newing up a speech grammar is better approach Michael: Let's just raise issues now rather than solve them Debbie: Example is complex, and gets mixed up with arguement that JS is complex * laptop? Michael: Next example from Bjorn Robert: The example lacks a grammar <smaug> s/onclick="startSpeech"/onclick="startSpeech(event)"/ <DanD> [37]http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Nov /att-0008/web-speech-sample-code.html [37] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Nov/att-0008/web-speech-sample-code.html Robert: Need to define what happens when lacking a grammar Avery: Is there a policy against comments in the examples? Michael: Planning on adding examples to an appendix Avery: It's a decent example, as long as it is clear that this instance lacks a grammar Robert: Example shows default behavior Rahul: Could also delete button as means of shorting example <glen> per Avery's suggestion: add a comment "since no grammar is specified and no element is binded, uses default grammar builtin:dictation" Rahul: Two different ways to perform same array access Glen: Should make it consistent in example <mbodell> In Bjorn's second example need sir.maxNBest = 2; <glen> use same notation: s/q.value = event.result.item(0).interpretation;/q.value = event.result[$1\47].interpretation;/ Robert: Intent is to get a text transcript of the user's input ... why are we accessing the interpretation instead of tokens? Milan: Need to bring this up in protocol team <all agreed> to replace to "utterance" in place of interpretation Milan: Last two comments should apply here as well ... Should we have company-specific references? Michael: Prefer example.org Robert: Is there speech recognition in turn by turn> Michael: Speech recognition is just destination capture <smaug> Again, s/onclick="startSpeech"/onclick="startSpeech(event)"/ Robert: The prefer speek next instruction should cancel last instruction Glen: Thought the purpose of example was to show interplay between speech and tts? Michael: TTS play resumes where last left off Glen: Way to stop prior play is a good feature ... we should change this example <glen> change example to show how to stop, by persisting the tts object and calling stop before adding .text and .play Michael: Ollie example next <mbodell> [38]http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Nov /att-0009/htmlspeech_permission_example.html [38] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Nov/att-0009/htmlspeech_permission_example.html Micahel: First example is just removing unauthorized elements? ... but second example doesn't allow speech input to start Ollie: Yes Michael: Can you transition from not authorized to authorized? Ollie: Should be possible, but example doesn't do that ... but could also just reload the page * Going on break now <inserted> scribe:ddahl_ <scribe> scribe:ddahl Robert's example robert: two recognitions in a row, you want to pick your cities based on what state you're in. <Avery> Actually I think it's based on what state is specified in the first reco, not necessarily what state you're in. A minor nit. robert: it really should say "interpretation.state", not just "interpretation" ... used push instead of adding things to the array of speech grammars ... a bug on result, should be city, also, sr.onMatch should be sr.onResult ... second example is rereco ... gives grammars to speechInputRequest, then classifies, then does rereco with a specific grammar glenn: this seems to be a strange use of "interpretation" robert: there is a huge universe of grammars rahul: this is identifying one grammar as different from the others robert: using the attribute "modal" to activate and deactivate grammars ... would change the example to get interpretation.classification ... strange to have multiple "modals" as true, think modal might be a bad idea speech-enabled email michael: one interesting thing is that you might get notifications that you would want to speak to, but without clicking robert: was mostly thinking about things like "reply", but you could also imagine saying "read it to me" after notification ... made up a method to cancel TTS michael: you could just delete the element robert: what if you set up the element with stuff in it? glenn: destroy should not be to only way to cancel Milan's example of protocol milan: will augment with API calls that trigger protocols ... need a result index of some kind ... then recognizer decides to change its mind and reorders results ... strange to get a "complete" result in the middle of a long dictation ... result index 0 is the first fragment, then halfway through the second fragment, the recognizer says the first one is done ... different from MRCP, because in MRCP that means it's the end of it ... then retracts a result, not sure how to represent this, maybe an "IN_PRO ... GRESS" message with no payload ... we will put this in the larger document as an example of the protocol michael johnston's multimodal use case <smaug> Could you please paste links to the example here michael: "I want to go from here to there" is the use case <smaug> ( would be then easier to read minutes later ) <mbodell> Michael's example: [39]http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Nov /att-0020/multimodal_example.html [39] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Nov/att-0020/multimodal_example.html <mbodell> You can walk through the examples from: [40]http://bantha.org/~mbodell/speechxg/f2f.html which links to [41]http://bantha.org/~mbodell/speechxg/examples.html which then walks through the examples [40] http://bantha.org/~mbodell/speechxg/f2f.html [41] http://bantha.org/~mbodell/speechxg/examples.html glenn: it would be good to have a "state" attribute ... the "nomatch" state is more of a result, not a state ... we may need more than one attribute to get results of speech processing michael: this also has the EMMA so that you can see the mapping from EMMA ... this example makes use of a remote speech service <glen> [42]http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Nov /att-0020/multimodal_example.html [42] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Nov/att-0020/multimodal_example.html michael: the EMMA shows the combined speech and gui input robert: this should be a wss: , that is, a web socket protocol, but what should we do if someone uses http? michael: you could get the command right but not the person if you didn't do the "clickInfo" Charles Hemphill's example <glen> [43]http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Nov /0024.html [43] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Nov/0024.html danD: we should start with the simplest example Michael Bodell's example 8, translation <glen> [44]http://bantha.org/~mbodell/translate.html [44] http://bantha.org/~mbodell/translate.html <glen> view-source:[45]http://bantha.org/~mbodell/translate.html [45] http://bantha.org/~mbodell/translate.html michael: different example of translation ... there's from and to languages, you choose, and then click on microphone to talk ... there's a progress bar that get's updated ... we're grabbing our language from the selector, we're using a dictation grammar for whatever language we're using ... where are we doing capture? glen: wouldn't that be the microphone? michael: not necessarily, there could be other things like media streams glen: is capture necessary or does it just provide more features? michael: we didn't have any examples of capture from other places, like from Web RTC ... right now there's no standard for accessing microphone glen: would like to see default example where we don't have to explicitly do capture michael: all examples assume that there's magic for capturing audio glen: can't we make it so that the magic is what happens by default? dan: there are many security and privacy issues ... different permissions for getting access to media but also to do something to the media michael: this is also raised in some of our issues, we only have a two sentence note now ... can TTS work on Web Sockets? robert: yes michael: on audio start, etc. are in our spec. another issue is that payload of start, stop events isn't defined robert: : do we have VU meter events? michael: no dan: that came up in Web RTC, they don't have that, but they could create it michael: we do have speech-x events for custom extensions robert: most speech apps have one michael: is that part of the UA or the app? Debbie's example multi-slot filling <mbodell> Debbie's: [46]http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Nov /att-0031/Multi-slotSpeech1.html [46] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Nov/att-0031/Multi-slotSpeech1.html debbie: in this example you have to pull out the slot values from the EMMA robert: is this the same as saying "interpretation.booking"? debbie: not sure ... we don't know what's in "interpretation" robert: we could get rid of "interpretation" michael: it could be a useful pointer into the EMMA ... that is available in VXML <mbodell> Issue: we should make sure it is clear what the interpretation points to <trackbot> Created ISSUE-1 - We should make sure it is clear what the interpretation points to ; please complete additional details at [47]http://www.w3.org/2005/Incubator/htmlspeech/track/issues/1/edit . [47] http://www.w3.org/2005/Incubator/htmlspeech/track/issues/1/edit michael: should do an if to make sure that you really got a value debbie: could add the EMMA ... would there be value in some kind of convenience syntax so that you don't need the full DOM generality to manipulate the EMMA result? <mbodell> Charles' example: [48]http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Nov /0033.html [48] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Nov/0033.html another example from Charles Hemphill michael: the same example as before but with an external grammar avery: what's the advantage of having "reco" element as a child under "input" michael: there are two different ways to do the same thing, with "reco" under as a "child" under <input> you don't need an id <smaug> <input> element can't have child elements actually, input is a child of reco in the proposal <smaug> My comments to example 3 [49]http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Nov /0034.html [49] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Nov/0034.html michael: another example with a real inline grammar so that you don't have to do data uri ... we would have to define a "grammar" tag robert: we would have to define for browsers how to interpret SRGS avery: like putting script in page vs an external reference <smaug> Milan: remember, we're talking about HTML here, not XML <smaug> (I assume that was Milan) milan: could we say "as long as this is valid XML ignore it and pass it to us"? robert: why wrap the whole thing with the grammar element? michael: if there's an SRGS 1.1, you wouldn't know what version it was, for example ... would like to have inline grammar, if any, be full SRGS with <grammar> element ... that is the end of the examples <Milan> * Good point Ollie <glen> scribenick: glen issues burnett: if can't agree, depends on importance. If important, capture different opinions in doc. ... (not required to resolve everything in incubator group) <mbodell> First issue to discuss: [50]http://bantha.org/~mbodell/speechxg/issuep1.html [50] http://bantha.org/~mbodell/speechxg/issuep1.html 1. What Content-Type do we want to use on an empty message? Use case was nulling out previous candidate recognition. milan: do we have to specify? can it be assumed? ... empty means no payload? robert: protocol doesn't require a body ... in which case I don't think it needs a content type. Example getParams michael: what about interim results? ... if no content and not content type then nulls out corresponding result. Example: an interim result gets replaced with no result (e.g. if a <cough> is initially recognized as some text) Protocol Issues 2. I am skeptical about changing established MRCP event/method names. I sort of agree that LISTEN is better than RECOGNIZE, but do not think the reasons are good enough to warrant ensuing churn. Robert: Microsoft doesn't care if similar to MRCP, rather that it's compatible with our web sockets protocol burnett: web sockets is just a transport ... violates many types of protocol design ... if standards track, IETF is a logical place robert: so naming doesn't matter much at this point. all: agree burnett: some talk of using SIP to setup, would have to separate signaling and data...which is one thing wrong with this. robert: this is more to illustrate a point that it can be done burnett: companies could implement today, and may not be completely interoperable (as is often the case on first implementations) michael: we agree, not to change names right now. Names will likely to be re-evaluated in a standards track. ... minor syntax issues can be called out as a note in the doc. burnett: when gets into a standards group, they look at requirements and take ideas into consideration, but they consider MANY other factors, e.g. security, that drive 3. We need a way to index the recognition results. I suggest using a Result-Index header all: agree to add. if a one-shot recognition, it's only [$1\47] and still optional 4. It was awkward to use a RECOGNITION-COMPLETE message presumably with a COMPLETE status during continuous speech. Instead, I used INTERMEDIATE-RESULT with a new Result-Status header set to final. robert: just rename RECOGNITION-COMPLETE as RECOGNITION-RESULT ... it's an intermediate, unless it's a final response type. burnett: MRCP has separate status code and completion code Milan: we need a complete flag, not sure it was defined. We haven't stated which status codes correspond to which messages. burnett: in MRCP, status is about communication (like 200 OK). In MRCP, the completion code indicates what happened (e.g. successful reco) robert: so status indicates "sending more", so status should be in-progress for continuous reco case. ... need request state? burnett: request has been made, has it been completed yet? status is success, illegal method, illegal value, unsupported header robert: reco result, 200 OK, in progress 5. Perhaps Source-Time should also be required on final results all: yes, everything's fine, more to come Milan: by time have final result, should know start time. all: agree, require only reco result Milan: could be reco result with type = pending michael: pending implies have already started robert: in progress more accurate all: agree to leave as is 6. Wanted to confirm that channel identification is being handled by the WebSocket container robert: handled by web socket ... if two separate recos, then two web sockets and two audio streams. (Can have 2 grammars active in one reco) milan: continuous hotword case robert: that's continuous reco ... start session with hotword and command-control grammar, all is continuous results michael: hard if change over time ... because have to pause to change ... so not continuous robert: don't want to transmit audio twice, but with two sessions, you must avery: does emma result specifies which grammar? michael: yes 7. I noticed that Completion-Cause was missing from Robert's spec example in section 4.2. robert: accidental omission, need to add Web API Issues 1. To get the reco result I think i have to write "e.result.item(0).interpretation". This is a lot of dots and an index just to get the top result. robert: I want to write e.interpretation -- because most of the time that's what I want (but still could use the verbose way as well) <mbodell> Here is the link to where the event is defined: [51]http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Oct /att-0064/speechwepapi_1_.html#speechinputresult [51] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Oct/att-0064/speechwepapi_1_.html#speechinputresult milan: e.result.interpretation michael: can already use e.result[$1\47].interpretation glen: we should change utterance to match all: e.interpretation and e.utterance ... agreed 2. "utterance" has a couple of different meanings in the doc. It's alternatively the recording of what the person said, or the transcript returned by the recognizer. michael: transcript? text? tokens? text and token are over used and confusing robert: but it is text, so not overloading the concept ... (unlike token) burnett: transcript, closest to what's actually happening, laymen get it glen: text is not descriptive: interpretation is text, whereas transcript vs interpretation is clear all: agree: rename utterance to transcript 5. The "modal" attribute on SpeechGrammar is unnecessarily restrictive Discussion: There are cases where I'll want to have multiple grammars active, but not all, and not just one. Developers would be better off with a boolean enabled attribute on each grammar. Would be useful to clarify the behavior when there is more than 1 grammar with this set to true (only the first in the list is active?) Is this even useful at all? What is the case for having grammars which aren't active in the reco? Can we change the state of the modal/u robert: less lines of code if just set one to true milan: alternatively, could add/remove from grammars array glen: sending all at once allows caching ... of grammars ... what about continuous case, can grammars change on the fly michael: we decided to simplify by re-calling .start to change grammars or anything else milan: should have a separate way to preload burnett: voicexml has defineGrammar milan: grammar set object on the SpeechInputRequest ... I proposing sets of grammars robert: I'd like it flatter, get rid of enabled/disabled -- just delete -- and don't allow preload michael: already have .open that allows preloading <mbodell> See web api: [52]http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Oct /att-0064/speechwepapi_1_.html#dfn-open [52] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Oct/att-0064/speechwepapi_1_.html#dfn-open <scribe> scribenick: glen burnett: nervous about this, we discussed this for a long time and considered many edge-cases robert: alternative: get rid of modal and enable, and just use a bunch of grammars avery: if .open has already been called, .start doesn't call it (.start only calls .open if hasn't been opened yet) burnett: wondering if there are performance advantages in reco engine if can call enable/disable as opposed to calling .open multiple times? <smaug> start might need to re-call open if authorizationState has changed from not-authorized to authorized milan: MRCP didn't solve this, why should we? ... all good MRCP clients do what you're saying automatically, they automatically check for the deltas robert: this runs at web-scale, distributed ... big difference between telephony and web michael: options: eliminate model, keep and define what happens if multiple set to true avery: easier to add later than remove agree: eliminate .modal 5. The interpretation attribute is completely opaque. That may be necessary given that SISR can pretty much return anything. But it'll need some examples to show how to use it. burnett: there was support for a flat array of interpretations ... I didn't like that, Nuance and their customers didn't like it, debbie: use emma to define layout michael: different reco engines may use emma in different ways ... fundamentally, .interpretation points to somewhere in emma, which simplifies (and a corresponding .transcript) all: agree, specify which part of emma holds the interpretation michael: mapped to a DOM object, emma literal or node ... like debbie's slot-filling example ... I will send text for this 5. The array of SpeechGrammar objects is too cumbersome <smaug> something happened to the audio. it is all just noise <smaug> though, getting late here Discussion: Robert: The array of SpeechGrammar objects is too cumbersome. In most cases I'd like to write something simple like: mySR.speechGrammars.push("USstates.grxml","majorUScities.grxml","maj orInternationalCities.grxml"); But I can't. I have new-up a separate object for each one then add it to the array, even when I don't care about the other attributes. Better to just make it an array of URI strings, and add functions for the edge cases. e.g. void ena void setWeight(in DOMString grammarUri, in float weight); And yeah, I remember arguing the opposite on the phone call. But that's before I tried writing sample code. Glen: "The uri of a grammar associated with this reco. If unset, this defaults to the default builtin uri." Presumably using the grammar attribute overwrites the default grammar, so if a developer wishes to add a grammar that supplements the default grammar, then this alternative should work: re would add clarity. Michael: If you view source on the web api document you'll see the grammar functions and descriptions are there commented out as I anticipated, and agree, with this comment. We should have both functions and array/collections and this makes the things that Robert and Glen describe much easier/better. michael: grammar spec after ? are hints, before builtin: are required and errors if not supported ...example: builtin:contacts may recognize names in smartphone ... require built:generic burnett: built:generic means I'll take anything you got: if it's just a date grammar, I'll take it. <mbodell> We are talking about [53]http://bantha.org/~mbodell/speechxg/issuew5.html but really more about what happens with no grammar [53] http://bantha.org/~mbodell/speechxg/issuew5.html milan: builtin:generic could respond with failure, builtin:dictation could also respond with failure robert: builtin:generic should be builtin:default ... and none specified is builtin:default burnett: what if want to use both default and another grammar glen: then add builtin:default and builtin:foo michael: default is not user default, but service or ua default milan: want a way to record without a grammar michael: we define builtin:default, encourage vendors to implement, and state when none specified, it's on by default. (and when other grammars specifed, it can also be added. ... I like .addGrammer(url, weight) as a simplification from creating object and then setting it robert: .addGrammarFromUrl(url, weight) ... .addGrammarFromElement(element, weight) .addGrammarFromString(string, weight) ... better yet: .addUrlGrammar .addElementGrammar .addStringGrammar ... but advantage for objects to be alphabetical order, grouped together in docs glen: .addGrammarUrl .addGrammarElement .addGrammarString ... remove is a JavaScript array operation michael: also .addCustomParameter(name, value) all: agree: .addGrammarUrl .addGrammarElement .addGrammarString .addCustomParameter <smaug> I think this is enough for me. I'll read the minutes tomorrow and send comments <smaug> It is midnight here <smaug> dark? it has been dark hear for the last 6 hours <smaug> here <rahul> scribenick: rahul Issue 6 <mbodell> Link to the current issue: [54]http://bantha.org/~mbodell/speechxg/issuew6.html [54] http://bantha.org/~mbodell/speechxg/issuew6.html <glen> 6. The names are a bit long. <glen> Discussion: e.g. "new SpeechInputRequest()" vs "new SpeechIn()" . e.g. "mySR.speechGrammars.push("foo")" vs "mySR.grammars.push("foo")" . e.g. "resultEMMAXML" vs "EMMAXML" or just "EMMA" (call the other one "EMMAText" ) e.g. "inputWaveformURI" vs "inputURI" Milan: how about SpeechRequest instead of SpeechInputRequest? Robert: SpeechRecognizer? Milan: AudioSynthesizer? Glen: SpeechReco? <Milan> Milan: AudioSynth <Milan> * test Resolution: We will use SpeechReco instead of SpeechInputRequest <matt> [55]Parkinson's Law of Triviality [55] http://en.wikipedia.org/wiki/Bikeshedding <scribe> ACTION: Editing team to update to SpeechReco [recorded in [56]http://www.w3.org/2011/11/03-htmlspeech-minutes.html#action01] <trackbot> Sorry, couldn't find user - Editing Issue 7 7. SpeechInputRequest.outputToElement() should be an attribute, perhaps 'forElement' <matt> [57]Issue 7 [57] http://bantha.org/~mbodell/speechxg/issuew7.html Resolution: Replace outputToElement() function with the outputElement attribute Issue 8 <inserted> [58]Issue 8 [58] http://bantha.org/~mbodell/speechxg/issuew8.html 8. SpeechInputResult has a getter "item(index)". SpeechInputResultEvent has an array "SpeechInputResult[] results. Discussion: Can we change both to be collections similar toà [59]http://www.w3.org/TR/FileAPI/#dfn-filelistà (accessible via [] operator and optionally with a .item() method)? [59] http://www.w3.org/TR/FileAPI/#dfn-filelist Resolution: Accepted Issue 9 <matt> [60]Issue 9 [60] http://bantha.org/~mbodell/speechxg/issuew9.html 9. The <reco> element should probably be a void element with no content on its own Discussion: Satish:à [61]http://dev.w3.org/html5/spec/Overview.html#void-elements. I just noticed this in the for attribute's description, missed it in earlier reads: "If the for attribute is not specified, but the reco element has a recoable element descendant, then the first such descendant in tree order is the reco element's reco control." Is there a benefit to doing this over requiring the 'for' attribute to be set and making reco a void element? Charles: I a [61] http://dev.w3.org/html5/spec/Overview.html#void-elements. <glen> resolution: can specify with either descendent or with for= attribute Resolution: Agreed to leave it as-is using either the for or the descendant pattern Issue 10 <matt> <inserted> [62]Issue 10 [62] http://bantha.org/~mbodell/speechxg/issuew10.html 10. TTS is hard <matt> |[63]http://bantha.org/~mbodell/speechxg/issuew8.html|| [63] http://bantha.org/~mbodell/speechxg/issuew8.html Discussion: Bjorn: I can't see any easy way to do programmatic TTS. The TTS element is at least missing the attributes @text and @lang. Without those, it's pretty hard to do the very simple use case of generating a string and speaking it. It's possible, but you need to build a whole SSML document. For use cases, see the samples I sent earlier today. Dominic: For TTS, I don't understand where the content to be spoken is supposed to go if it's not specified in Michael: @lang is not missing since it could be inherited ... there is no @text there Glen: content within <tts></tts> will show up within older browsers <mbodell> Discussion is <tts src="data:text/plain,Hello, world"/> versus <tts value="Hello, world"/> versus something else. Note in JS we could define a function so it is pretty similar, but from Markup a little harder to get the function creating the data uri (probably still possible) <glen> 72<tts value="fahrenheit">F</tts> <glen> michael: tts as a markup may render visually a control (play, stop, etc) <glen> ...other dom can interact <glen> glen: most uses of tts need dynamic control -- that is require javascript <glen> michael: because tts inherits from media-element, it requires a src attribute <glen> glen: <img alt="text"> <glen> michael: <tts> is not used as an alternative fallback Dan: usecase for <tts> element is to facilitate easy generation as part of markup rather than generating script s/|[64]http://bantha.org/~mbodell/speechxg/issuew8.html||// [64] http://bantha.org/~mbodell/speechxg/issuew8.html Michael: the @lang inherited from the <media> element should be passed as a parameter to the synthesizer Resolution: Add a @text attribute to <tts>. Issue 11 -> [65]http://bantha.org/~mbodell/speechxg/issuew11.html Issue 11 [65] http://bantha.org/~mbodell/speechxg/issuew11.html 11. How does binding to button work Discussion: Satish: "When the recoable element is aà buttonà then if the button is notà disabled, then the result of a speech recognition is to activate the button." "For button controls (submit, image, reset, button) the act of recognition just activates the input." "For type checkbox, the input should be set to a checkedness of true. For type radiobutton, the input should be set to a checkedness of true, and all other inputs in the radio button group must b Michael: propose to have an issue note that this needs further thought Robert: define what we can, and for others say there is no binding Resolution: Add issue note that more work to be done on bindings Issue 12 12. What about meter, progress, and output elements? Discussion: Satish: The meter, progress and output elements all seem to be aimed at displaying results and not for taking user input. Is there a reason why these are included as recoable elements?Michael: This is specified atà Reco Bindings. A person could want to be able to speak and have it change a progress bar or meter or output element. The primary reason is matching what is done with label. These are all labelable elements and thus ended up as recoable Glen: suggest we not talk about bindings to these Dan: we need to decide which ones to leave out, I agree since these are not even <input> elements Resolution: Remove these from the recoable elements and bindings Issue 13 13. grammars and parameters should be collections Discussion: Satish: Similar toà issue 8, SpeechInputRequest attributes 'grammars' and 'parameters' should probably be turned into a collection as well Resolution: Accepted Issue 14 14. rename language to lang Discussion: SpeechInputRequest.language should probably be changed to 'lang' to matchà lang attributes. Resolution: Accepted Issue 15 15. rename iterimResults to interimResultsInterval Discussion: SpeechInputResult.interimResults should probably be renamed to interimResultsInterval to indicate its usage similar to how other attributes have 'Timeout' in their names Resolution: Turn into boolean property, name does not change Issue 16 16. drop enum prefixes Discussion: SPEECH_AUTHORIZATION_ prefix could be dropped for the enums and just have 'UNKNOWN', 'AUTHORIZED' & 'NOT_AUTHORIZED' (similar toà XHR States). Same for SPEECH_INPUT_ERR_* and other such enums. Resolution: Accepted (given Satish's input and expertise) Issue 17 17. A way to uncheck automatically by speech? Discussion: Glen: "For type checkbox, the input should be set to a checkedness of true." It would be nice to have a way to allow user to say something to set it to false, but I can't think of a good convention for this other than adding an attribute or grammar. Perhaps this could/should only be possible via scripting. (I don't like the idea of toggling the checkbox because some users may not be able to easily observe what state the checkbox is currently in.) Resolution: See resolution to issue 11 <smaug> mbodell: I'm kind of online <smaug> what enum conflicts? <smaug> if the const is in an interface, then no Issue 18 <inserted> [66]Issue 18 [66] http://bantha.org/~mbodell/speechxg/issuew18.html 18. Binding hints versus requirements Discussion: Glen: "For date and time types ... type of color ... type of range the assignment is only allowed if it is a valid ..." On our call we discussed how these grammars are hints, and in particular how pattern may be difficult to implement. We discussed that showing an output response, even an invalid one, may be more valuable than no response. Michael: We can do hints for patterns on text, and for numbers out of range, but for other types HTML5 is jus Resolution: See resolution to issue 11 <glen> satish provides this example of two sets of enums, with no prefixes. <glen> [67]https://developer.mozilla.org/en/DOM/HTMLMediaElement [67] https://developer.mozilla.org/en/DOM/HTMLMediaElement Issue 19 19. Does reco and TTS need to be on a server as opposed to client side? Discussion: Dominic: The spec for both reco and TTS now allow the user to specify a service URL. Could you clarify what the value would be if the developer wishes to use a local (client-side) engine, if available? Some of the spec seems to assume a network speech implementation, but client-side reco and TTS are very much possible and quite desirable for applications that require extremely low latency, like accessibility in particular. Is there any possibility <matt> [68]Issue 19 [68] http://bantha.org/~mbodell/speechxg/issuew19.html <glen> satish continues: HTMLMediaElement.LOADED so no clashes <glen> (above refers to issue 16) Resolution: The service does not need to be remote, UAs may define URIs to local engines. We should add clarifying text specifying this. Also, the serviceURI does not need to be remote. We will clarify this as well. Issue 20 20. Set lastMark? Discussion: Dominic: An earlier draft had the ability to set lastMark, but now it looks like it's read-only, is that correct? That actually may be easier to implement, because many speech engines don't support seeking to the middle of a speech stream without first synthesizing the whole thing. Michael: Actually the speech xg version has never supported setting a lastMark. You can control playback using the normalà mediaà controls (setting currentTime, seekabl Resolution: Leave as-is right now. Add issue note about also making it writeable. Issue 21 21. More frequent callbacks? Discussion: Dominic: When I posted the initial version of the TTS extension API on the chromium-extensions list, the primary feature request I got from developers was the ability to get sentence, word, and even phoneme-level callbacks, so that got added to the API before we launched it. Having callbacks at ssml markers is great, but many applications require synchronizing closely with the speech, and it seems really cumbersome and wasteful to have to add an s Resolution: Leave as-is. Suggest as enhancement to SSML. Issue 22 22. How do we fit with capture/input/MediaStream? Discussion: Michael: Our spec has:à attribute MediaStream input;à but we have nearly no explanation of it and our examples don't show how to use it. Can we do better? <mbodell> spec link: [69]http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Oct /att-0064/speechwepapi_1_.html#dfn-input [69] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Oct/att-0064/speechwepapi_1_.html#dfn-input Resolution: This XG probably can't do better. We should have an issue note and include the assumption that media stream input somehow happens.This seems to be of interest to numerous groups (Audio, DAP, Web RTC, HTML Speech XG ...), Debbie will follow up as part of the HCG. <scribe> ACTION: ddahl2 to set up follow-up via HCG [recorded in [70]http://www.w3.org/2011/11/03-htmlspeech-minutes.html#action02] <trackbot> Created ACTION-4 - Set up follow-up via HCG [on Deborah Dahl - due 2011-11-11]. Issue 23 23. How does speechend and related events do timing? <matt> [71]Issue 23 [71] http://bantha.org/~mbodell/speechxg/issuew23.html Discussion: Michael: Our spec is missing explanations around the timing and how the information is reflected. <scribe> Meeting: HTML Speech Incubator Group - 2011 TPAC F2F, Day 1 Resolution: define the data to reflect the source-time back into the events. Do it on all events that accept time (including result and speech-x). Note this timing is always relative to the "stream-time" and real time may be faster or slower than that. <kaz> [ Thursday meeting adjourned ]
Received on Friday, 4 November 2011 18:31:39 UTC