[minutes] 29 September 2011 from Dan Burnett on 2011-09-30 (public-xg-htmlspeech@w3.org from September 2011)

From: Dan Burnett <dburnett@voxeo.com>
Date: Fri, 30 Sep 2011 10:22:51 -0400
To: public-xg-htmlspeech@w3.org
Message-Id: <2E7A4F70-2FA2-4BB4-8BAE-AA844AA35E41@voxeo.com>
Group,

The minutes from yesterday's call are available at http://www.w3.org/2011/09/29-htmlspeech-minutes.html

For convenience, a text version is embedded below.

Thanks to Dan Druta for taking the minutes.

-- dan

**********************************************************************************

              HTML Speech Incubator Group Teleconference

29 Sep 2011

   [2]Agenda

      [2] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Sep/0042.html

   See also: [3]IRC log

      [3] http://www.w3.org/2011/09/29-htmlspeech-irc

Attendees

   Present
          Dan_Burnett, Bjorn_Bringert, Michael_Bodell, Olli_Pettay,
          Milan_Young, Debbie_Dahl, Dan_Druta, Patrick_Ehlen,
          Charles_Hemphill, Robert_Brown, Glen_Shires, Michael_Johnston

   Regrets
   Chair
          Michael_Bodell

   Scribe
          Dan_Druta

Contents

     * [4]Topics
         1. [5]Web API
         2. [6]TTS
     * [7]Summary of Action Items
     _________________________________________________________


Web API

   <mbodell> Discussing SpeechInputResults

   <mbodell> Bjorn's mail:
   [8]http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Aug/
   0033.html

      [8] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Aug/0033.html

   <mbodell> Satish's proposal:
   [9]http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Sep/
   0034.html

      [9] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Sep/0034.html

   mbodell: speech results discussion

   <mbodell> My mail:
   [10]http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Sep
   /0043.html

     [10] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Sep/0043.html

   bringert: I did not understand what the semantics were

   mbodell: Three interfaces like in Bijorn proposal
   ... You get a speech result and inside you get an array of results

   bringert: so the results are a history

   burn: So the result number will not decrease

   mbodell: I can decrease if you get corrections

   bringert: what is the benefit to have everything?

   burn: It's not the history. It's the combined one

   mbodell: in a simple case you get a concatenation

   bringert: How do you know if this preliminary?

   mbodell: we can add a boolean

   Charles: Don't you want if it's final

   burn: when you get a Boolean in the event that is marked as final it
   means that particular indexed value in the results array is final

   Milan: I can see results coming finalized but the recognizer to
   shuffle them around

   bringert: It's up to recognizer

   Milan: let's have an example that shows that chunking

   bringert: Why can't we have a preliminary that replaces the previous
   one until it hits final

   Milan: this is more powerful and gives more flexibility
   ... I thought the proposal was simplier as was discussed in the F2F
   ... Why not the recognizer send another complete hypothesis,
   replacing all preceding results?
   ... The combined result would be more efficient (less headers)

   mbodell: result chunk size is not necessarily the same as finalizing
   chunk size

   Milan: If you are to dictate an email, are we expecting to have lots
   of indexes?
   ... we have an unrealistic example here

   Bringert: Michael can you explain a bit better the chunking?
   ... From a single result you get a single piece of semantics

   mbodell: What can't you do with the array?
   ... in a non continuous world it's not a problem

   bringert: how do you see this from the UI point of view?

   mbodell: you concatenate them each time
   ... The intent was to have the exact sequence
   ... if you have 3 results and one gets modified you get them
   ... you get a normal result anyway

   ddahl: how does it work with nbest?

   mbodell: I can see an API where you have results and one is wrong
   and gets replaces
   ... I agree it solves the simple use cases and it is more
   complicated for others

   bringert: How do I know when to to interpret the actions? We should
   get a final for everything

   Charles: the UI wants to show just finals. You want finals to come
   at a reasonable pace

   mbodell: there's a tradeoff

   <burn> s/Charles: the UI/Charles: maybe the UI/

   Milan: I know it's not going to change but maybe there's an external
   input that might change it
   ... if the user changes one word or another it might trigger more
   changes and might send and updated array

   burn: the reason I like final is that I can archive them

   glen: If you have a command based on what the user said it might not
   be undoable
   ... there's the use case where the user doesn't care about
   preliminary. They just need the final

   mbodell: this is more for online improvements

   glen: the rerecognize should solve our problem
   ... if you have 8 hour dictation and you have a correction in the
   first sentences, are you going to send the whole 8 hour?

   mbodell: we should put a limit. We can solve this in the protocol

   Milan: All I'm asking is a definition of final

   bringert: we should define final as something that never changes

   Milan: we add a proprietary API it would be a spec violation

   mbodell: I'm not sold into final is final but I'm OK

   Milan: what would be the language in the spec then?

   bringert: Final will be final and in the future we would add a
   correction event
   ... we would add another call back

   mbodell: it is possible we can represent the result array as read
   only array in the result event

   <smaug> evt.results would become evt.target.results

   bringert: I'm fine with the way is proposed right now
   ... would this be only in continuous?

   mbodell: in one shot the index would not be larger than one

   bringert: Maybe we should sent different events for continuous and
   for one shot
   ... for one shot works and have to have a boolean
   ... this is more the state of the request
   ... the nice thing about having the in the request is that you don't
   have to look in the event
   ... do we have any outstanding issues?

   glen: what's outstanding is that the reco object would look like.
   Working on the proposal

TTS

   <mbodell> TTS element section of proposal:
   [11]http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Sep
   /att-0008/speechwepapi.html#tts-section

     [11] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Sep/att-0008/speechwepapi.html#tts-section

   <mbodell> TTS JS API (not really filled in):
   [12]http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Sep
   /att-0008/speechwepapi.html#speechoutputrequest-section

     [12] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Sep/att-0008/speechwepapi.html#speechoutputrequest-section

   bringert: this is basically extending the HTML5 media element

   mbodell: the media element is missing the mark

   bringert: in the proposal we submitted I added another attribute
   "last mark"

   <bringert> Bjorn's TTS proposal:
   [13]http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Feb
   /att-0022/htmltts-draft.html

     [13] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Feb/att-0022/htmltts-draft.html

   mbodell: the time mark event is something new?

   bringert: the last mark is new
   ... you seek by time. Should last mark update then?

   mbodell: in voice XML there's something similar

   bringert: you will only update when you get a last mark
   ... if you want clock time you can store the marks yourself
   ... What if I want to send text? I guess I can use a data uri
   ... in my proposal I had a value element
   ... I would propose to keep the value element in the spec

   mbodell: the issue is that media requires a source

   bringert: it's a media element we extend
   ... we have the use case where we need to send the text
   ... I'm fine with the way it is right now

   mbodell: If we can avoid it would be great

   bringert: we should loop in the HTML5 WG for advise
   ... there more language that has to be added

   <mbodell> add "Implementations should support at least UTF-8 encoded
   text/plain and application/ssml+xml. " and othe likewise text

   mbodell: do we need a SpeechOutput object?

   burn: SSML 1 or SSML 1.1?

   bringert: 1.1 is an extension, right?
   ... We should state that implementation should support 1 and 1.1

   burn: 1.1 gives more flexibility and has more inteligence
   ... when is a good time to start the sync discussion between API and
   protocol?

   mbodell: next week

   Robert: if people send questions before hand would be better
Received on Friday, 30 September 2011 14:23:32 UTC