- From: Dan Burnett <dburnett@voxeo.com>
- Date: Fri, 30 Sep 2011 10:22:51 -0400
- To: public-xg-htmlspeech@w3.org
Group, The minutes from yesterday's call are available at http://www.w3.org/2011/09/29-htmlspeech-minutes.html For convenience, a text version is embedded below. Thanks to Dan Druta for taking the minutes. -- dan ********************************************************************************** HTML Speech Incubator Group Teleconference 29 Sep 2011 [2]Agenda [2] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Sep/0042.html See also: [3]IRC log [3] http://www.w3.org/2011/09/29-htmlspeech-irc Attendees Present Dan_Burnett, Bjorn_Bringert, Michael_Bodell, Olli_Pettay, Milan_Young, Debbie_Dahl, Dan_Druta, Patrick_Ehlen, Charles_Hemphill, Robert_Brown, Glen_Shires, Michael_Johnston Regrets Chair Michael_Bodell Scribe Dan_Druta Contents * [4]Topics 1. [5]Web API 2. [6]TTS * [7]Summary of Action Items _________________________________________________________ Web API <mbodell> Discussing SpeechInputResults <mbodell> Bjorn's mail: [8]http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Aug/ 0033.html [8] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Aug/0033.html <mbodell> Satish's proposal: [9]http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Sep/ 0034.html [9] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Sep/0034.html mbodell: speech results discussion <mbodell> My mail: [10]http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Sep /0043.html [10] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Sep/0043.html bringert: I did not understand what the semantics were mbodell: Three interfaces like in Bijorn proposal ... You get a speech result and inside you get an array of results bringert: so the results are a history burn: So the result number will not decrease mbodell: I can decrease if you get corrections bringert: what is the benefit to have everything? burn: It's not the history. It's the combined one mbodell: in a simple case you get a concatenation bringert: How do you know if this preliminary? mbodell: we can add a boolean Charles: Don't you want if it's final burn: when you get a Boolean in the event that is marked as final it means that particular indexed value in the results array is final Milan: I can see results coming finalized but the recognizer to shuffle them around bringert: It's up to recognizer Milan: let's have an example that shows that chunking bringert: Why can't we have a preliminary that replaces the previous one until it hits final Milan: this is more powerful and gives more flexibility ... I thought the proposal was simplier as was discussed in the F2F ... Why not the recognizer send another complete hypothesis, replacing all preceding results? ... The combined result would be more efficient (less headers) mbodell: result chunk size is not necessarily the same as finalizing chunk size Milan: If you are to dictate an email, are we expecting to have lots of indexes? ... we have an unrealistic example here Bringert: Michael can you explain a bit better the chunking? ... From a single result you get a single piece of semantics mbodell: What can't you do with the array? ... in a non continuous world it's not a problem bringert: how do you see this from the UI point of view? mbodell: you concatenate them each time ... The intent was to have the exact sequence ... if you have 3 results and one gets modified you get them ... you get a normal result anyway ddahl: how does it work with nbest? mbodell: I can see an API where you have results and one is wrong and gets replaces ... I agree it solves the simple use cases and it is more complicated for others bringert: How do I know when to to interpret the actions? We should get a final for everything Charles: the UI wants to show just finals. You want finals to come at a reasonable pace mbodell: there's a tradeoff <burn> s/Charles: the UI/Charles: maybe the UI/ Milan: I know it's not going to change but maybe there's an external input that might change it ... if the user changes one word or another it might trigger more changes and might send and updated array burn: the reason I like final is that I can archive them glen: If you have a command based on what the user said it might not be undoable ... there's the use case where the user doesn't care about preliminary. They just need the final mbodell: this is more for online improvements glen: the rerecognize should solve our problem ... if you have 8 hour dictation and you have a correction in the first sentences, are you going to send the whole 8 hour? mbodell: we should put a limit. We can solve this in the protocol Milan: All I'm asking is a definition of final bringert: we should define final as something that never changes Milan: we add a proprietary API it would be a spec violation mbodell: I'm not sold into final is final but I'm OK Milan: what would be the language in the spec then? bringert: Final will be final and in the future we would add a correction event ... we would add another call back mbodell: it is possible we can represent the result array as read only array in the result event <smaug> evt.results would become evt.target.results bringert: I'm fine with the way is proposed right now ... would this be only in continuous? mbodell: in one shot the index would not be larger than one bringert: Maybe we should sent different events for continuous and for one shot ... for one shot works and have to have a boolean ... this is more the state of the request ... the nice thing about having the in the request is that you don't have to look in the event ... do we have any outstanding issues? glen: what's outstanding is that the reco object would look like. Working on the proposal TTS <mbodell> TTS element section of proposal: [11]http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Sep /att-0008/speechwepapi.html#tts-section [11] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Sep/att-0008/speechwepapi.html#tts-section <mbodell> TTS JS API (not really filled in): [12]http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Sep /att-0008/speechwepapi.html#speechoutputrequest-section [12] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Sep/att-0008/speechwepapi.html#speechoutputrequest-section bringert: this is basically extending the HTML5 media element mbodell: the media element is missing the mark bringert: in the proposal we submitted I added another attribute "last mark" <bringert> Bjorn's TTS proposal: [13]http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Feb /att-0022/htmltts-draft.html [13] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Feb/att-0022/htmltts-draft.html mbodell: the time mark event is something new? bringert: the last mark is new ... you seek by time. Should last mark update then? mbodell: in voice XML there's something similar bringert: you will only update when you get a last mark ... if you want clock time you can store the marks yourself ... What if I want to send text? I guess I can use a data uri ... in my proposal I had a value element ... I would propose to keep the value element in the spec mbodell: the issue is that media requires a source bringert: it's a media element we extend ... we have the use case where we need to send the text ... I'm fine with the way it is right now mbodell: If we can avoid it would be great bringert: we should loop in the HTML5 WG for advise ... there more language that has to be added <mbodell> add "Implementations should support at least UTF-8 encoded text/plain and application/ssml+xml. " and othe likewise text mbodell: do we need a SpeechOutput object? burn: SSML 1 or SSML 1.1? bringert: 1.1 is an extension, right? ... We should state that implementation should support 1 and 1.1 burn: 1.1 gives more flexibility and has more inteligence ... when is a good time to start the sync discussion between API and protocol? mbodell: next week Robert: if people send questions before hand would be better
Received on Friday, 30 September 2011 14:23:32 UTC