- From: Dan Burnett <dburnett@voxeo.com>
- Date: Fri, 30 Sep 2011 10:22:51 -0400
- To: public-xg-htmlspeech@w3.org
Group,
The minutes from yesterday's call are available at http://www.w3.org/2011/09/29-htmlspeech-minutes.html
For convenience, a text version is embedded below.
Thanks to Dan Druta for taking the minutes.
-- dan
**********************************************************************************
HTML Speech Incubator Group Teleconference
29 Sep 2011
[2]Agenda
[2] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Sep/0042.html
See also: [3]IRC log
[3] http://www.w3.org/2011/09/29-htmlspeech-irc
Attendees
Present
Dan_Burnett, Bjorn_Bringert, Michael_Bodell, Olli_Pettay,
Milan_Young, Debbie_Dahl, Dan_Druta, Patrick_Ehlen,
Charles_Hemphill, Robert_Brown, Glen_Shires, Michael_Johnston
Regrets
Chair
Michael_Bodell
Scribe
Dan_Druta
Contents
* [4]Topics
1. [5]Web API
2. [6]TTS
* [7]Summary of Action Items
_________________________________________________________
Web API
<mbodell> Discussing SpeechInputResults
<mbodell> Bjorn's mail:
[8]http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Aug/
0033.html
[8] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Aug/0033.html
<mbodell> Satish's proposal:
[9]http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Sep/
0034.html
[9] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Sep/0034.html
mbodell: speech results discussion
<mbodell> My mail:
[10]http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Sep
/0043.html
[10] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Sep/0043.html
bringert: I did not understand what the semantics were
mbodell: Three interfaces like in Bijorn proposal
... You get a speech result and inside you get an array of results
bringert: so the results are a history
burn: So the result number will not decrease
mbodell: I can decrease if you get corrections
bringert: what is the benefit to have everything?
burn: It's not the history. It's the combined one
mbodell: in a simple case you get a concatenation
bringert: How do you know if this preliminary?
mbodell: we can add a boolean
Charles: Don't you want if it's final
burn: when you get a Boolean in the event that is marked as final it
means that particular indexed value in the results array is final
Milan: I can see results coming finalized but the recognizer to
shuffle them around
bringert: It's up to recognizer
Milan: let's have an example that shows that chunking
bringert: Why can't we have a preliminary that replaces the previous
one until it hits final
Milan: this is more powerful and gives more flexibility
... I thought the proposal was simplier as was discussed in the F2F
... Why not the recognizer send another complete hypothesis,
replacing all preceding results?
... The combined result would be more efficient (less headers)
mbodell: result chunk size is not necessarily the same as finalizing
chunk size
Milan: If you are to dictate an email, are we expecting to have lots
of indexes?
... we have an unrealistic example here
Bringert: Michael can you explain a bit better the chunking?
... From a single result you get a single piece of semantics
mbodell: What can't you do with the array?
... in a non continuous world it's not a problem
bringert: how do you see this from the UI point of view?
mbodell: you concatenate them each time
... The intent was to have the exact sequence
... if you have 3 results and one gets modified you get them
... you get a normal result anyway
ddahl: how does it work with nbest?
mbodell: I can see an API where you have results and one is wrong
and gets replaces
... I agree it solves the simple use cases and it is more
complicated for others
bringert: How do I know when to to interpret the actions? We should
get a final for everything
Charles: the UI wants to show just finals. You want finals to come
at a reasonable pace
mbodell: there's a tradeoff
<burn> s/Charles: the UI/Charles: maybe the UI/
Milan: I know it's not going to change but maybe there's an external
input that might change it
... if the user changes one word or another it might trigger more
changes and might send and updated array
burn: the reason I like final is that I can archive them
glen: If you have a command based on what the user said it might not
be undoable
... there's the use case where the user doesn't care about
preliminary. They just need the final
mbodell: this is more for online improvements
glen: the rerecognize should solve our problem
... if you have 8 hour dictation and you have a correction in the
first sentences, are you going to send the whole 8 hour?
mbodell: we should put a limit. We can solve this in the protocol
Milan: All I'm asking is a definition of final
bringert: we should define final as something that never changes
Milan: we add a proprietary API it would be a spec violation
mbodell: I'm not sold into final is final but I'm OK
Milan: what would be the language in the spec then?
bringert: Final will be final and in the future we would add a
correction event
... we would add another call back
mbodell: it is possible we can represent the result array as read
only array in the result event
<smaug> evt.results would become evt.target.results
bringert: I'm fine with the way is proposed right now
... would this be only in continuous?
mbodell: in one shot the index would not be larger than one
bringert: Maybe we should sent different events for continuous and
for one shot
... for one shot works and have to have a boolean
... this is more the state of the request
... the nice thing about having the in the request is that you don't
have to look in the event
... do we have any outstanding issues?
glen: what's outstanding is that the reco object would look like.
Working on the proposal
TTS
<mbodell> TTS element section of proposal:
[11]http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Sep
/att-0008/speechwepapi.html#tts-section
[11] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Sep/att-0008/speechwepapi.html#tts-section
<mbodell> TTS JS API (not really filled in):
[12]http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Sep
/att-0008/speechwepapi.html#speechoutputrequest-section
[12] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Sep/att-0008/speechwepapi.html#speechoutputrequest-section
bringert: this is basically extending the HTML5 media element
mbodell: the media element is missing the mark
bringert: in the proposal we submitted I added another attribute
"last mark"
<bringert> Bjorn's TTS proposal:
[13]http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Feb
/att-0022/htmltts-draft.html
[13] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Feb/att-0022/htmltts-draft.html
mbodell: the time mark event is something new?
bringert: the last mark is new
... you seek by time. Should last mark update then?
mbodell: in voice XML there's something similar
bringert: you will only update when you get a last mark
... if you want clock time you can store the marks yourself
... What if I want to send text? I guess I can use a data uri
... in my proposal I had a value element
... I would propose to keep the value element in the spec
mbodell: the issue is that media requires a source
bringert: it's a media element we extend
... we have the use case where we need to send the text
... I'm fine with the way it is right now
mbodell: If we can avoid it would be great
bringert: we should loop in the HTML5 WG for advise
... there more language that has to be added
<mbodell> add "Implementations should support at least UTF-8 encoded
text/plain and application/ssml+xml. " and othe likewise text
mbodell: do we need a SpeechOutput object?
burn: SSML 1 or SSML 1.1?
bringert: 1.1 is an extension, right?
... We should state that implementation should support 1 and 1.1
burn: 1.1 gives more flexibility and has more inteligence
... when is a good time to start the sync discussion between API and
protocol?
mbodell: next week
Robert: if people send questions before hand would be better
Received on Friday, 30 September 2011 14:23:32 UTC