- From: Dan Burnett <dburnett@voxeo.com>
- Date: Tue, 7 Jun 2011 20:08:55 -0400
- To: public-xg-htmlspeech@w3.org
Group,
The minutes from last week's call are available at http://www.w3.org/2011/06/02-htmlspeech-minutes.html
.
For convenience, a text version is embedded below.
Thanks to Michael Johnston for taking the minutes!
-- dan
****************************************************************************************************
Attendees
Present
Dan_Burnett, Milan_Young, Marc_Schroeder, Robert_Brown,
Patrick_Ehlen, Charles_Hemphill, Satish_Sampath, Glen_Shires,
Michael_Johnston, Olli_Pettay, Michael_Bodell,
Bjorn_Bringert, Dan_Druta, Debbie_Dahl, Raj_Tumuluri
Regrets
Chair
Dan Burnett
Scribe
Michael_Johnston
Contents
* [4]Topics
1. [5]updated final report document
2. [6]agreed upon design decisions
3. [7]additional issues to add to list of issues
4. [8]markup binding
5. [9]crucial decisions partially discussed
6. [10]do we support audio streaming and how?
7. [11]What is meant by "start of speech", "end of speech",
and endpointing in general? How do transmission delays
affect the definitions and what we want in terms of APIs?
* [12]Summary of Action Items
_________________________________________________________
<burn_> Agenda:
[13]http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Jun
/0006.html
[13] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Jun/0006.html
burn: start with review of face to face minutes, will review again
next week
... comments on minutes
updated final report document
burn: comments on draft at this point?
all: silence
agreed upon design decisions
additional issues to add to list of issues
michael: does move to have emma document in dom, remove impetus for
json variant of emma
bjorn: have simple javascript api for accessing most common
elements, dont need json variant of emma, for details can access
emma object
milan: need to do xml parsing?
bodell: will be much the same as other http requests that return
xml, dont need to parse
milan: are mobile devices a problem, verbosity of xml
bodell: no
dan: is there pressure from this group to build a json version of
emma?
all: agreement: no push for json version of emma
burn: any other issues to add to list for discussion
markup binding
bjorn: no feedback from chrome team yet
bodell: keep html binding lightweight, js constructor, simple "for"
mechanism, small work to define, if dont want then remove the
element
... should not mess up js api
olli: problem with for attribute it what it can point to, what
elements can be used as target, doesnt quite work with content
editable, important use case
... clarifies issue, need to make clear which elements can be
targets and what the semantics is
... also content editable areas
michael: have to define semantics when target is e.g. a drop down or
radio button
olli: may be new kinds of elements also
bodell: assumption would be to bind to any element, but they would
not all have to work,
bodell; some browsers would want to handle more input types
olli: reco would be element in the dom, what is the benefit of the
reco
... if for is not used
bodell: google desire to have element with microphone click api
bjorn: have proposed several things along the way,
... most important aspect is to have an element you can click to
start speaking without the pop up or info bar
olli: no clear what the element gives
robert: follow up with chrome folks
bjorn: still waiting on that
... do agree that html element discussion does not block the js api
discussion
olli: issue may get solved along the way
burn: need to see concrete proposal to make decision
crucial decisions partially discussed
<marc>
[14]http://www.w3.org/2005/Incubator/htmlspeech/2011/05/f2fminutes20
1105.html
[14] http://www.w3.org/2005/Incubator/htmlspeech/2011/05/f2fminutes201105.html
burn: will go through each ...
bjorn: audio capture topic is dealt with, should be default way, if
there is an audio capture api will deal with then
burn: audio codecs mandatory
robert: even IP status around speex is unclear also
... are only reasonable answers pcm and mulaw, despite their flaws
bjorn: flac, high bandwidth
<bringert> FLAC
<bringert>
[15]http://en.wikipedia.org/wiki/Free_Lossless_Audio_Codec
[15] http://en.wikipedia.org/wiki/Free_Lossless_Audio_Codec
milan: speex is in ietf draft on how to package in rtp
<Milan> [16]http://tools.ietf.org/html/draft-ietf-avt-rtp-speex-07
[16] http://tools.ietf.org/html/draft-ietf-avt-rtp-speex-07
burn: rtp way to send it does not mean there are ip issues
bjorn: need to require some codecs or cant be interoperable
milan: problems sounds similar
bodell: rfc does not have a patent policy
burn: if something is necessary to implement the spec, and it is
encumbered with IP, need to make that clear
bjorn: need protocol for interoperability
milan: protocol for RTC
burn: opus, codecs from two organizations, trying to blend, not
clear if IP issues are being resolved, making container
... can use either one if you have permission
... dont have an answer yet, really need one, industry wide problem,
may not be ours to solve, return to this
<mbodell> See
[17]http://lists.xiph.org/pipermail/speex-dev/2003-November/000753.h
tml for some similar discussion on patent of speex
[17] http://lists.xiph.org/pipermail/speex-dev/2003-November/000753.html
robert: will follow up re: speex again
milan: impact on protocol team if need to negotiate codec
marc: speex is not good enough for tts
<marc> ogg vorbis
<marc> and FLAC
burn: few names as candidates flac, ogg vorbis, speex, pcm
bjorn: already use flac in launched clients
olli: ogg vorbis is core html audio
<smaug> not core HTML audio. Some browsers just happen to support it
burn: candidates to consider flac, ogg vorbis, speex, pcm
do we support audio streaming and how?
burn: think we expect streaming, less clarity on how
milan: sending audio on regular time intervals as it is collected or
generated
bjorn: discussed how to get events while capturing
... how it is done is a protocol question
burn: asr may begin before the user is
... finished speaking, result before engine comes
milan: without regular timed packets, wont get events on regular
interval
bjorn: latency is what is app observable
bodell: having multiple events is not a big problem
... data in events can deal with timing
milan: if app is realtime, five seconds ago go this event
bjorn; agree, what we need is low latency, not sure what we can
require, part of being a good implementation
burn: market takes care of product requirements
robert: fair to say that standard should not have inherent
limitations
... 50 ms or so is the threshold
bjorn: protocol design should not make it impossible to achieve low
latency event delivery
marc: audio streaming in the tts case?
... send audio while still rendering rest of an long utterance
bodell: tts is generally fast enough that this is not a problem
marc: if tts has to process all text before returning audio, could
be a problem,
... wants to make sure that what we create here does not prevent an
implementation doing this
bjorn: up to engine whether it starts to synthesize
marc: wav format, header has filesize, makes proper streaming
bjorn: protocol should make it possible for the tts to be streamed
and start playing before
... synthesis is complete
burn: issue of supporting format coming back in video and
... and playing the audio
bjorn: should not require playing audio from video
robert: api should not prevent this
burn: video with three audio tracks, how does apis select
robert: our proposal separated capture api from reco, could support
different kinds of capture
burn: protocol design should not preclude streaming of video codecs
raj: why specify video?
robert: if codec can be packetized in real time should be ok
burn: the protocol should not inhibit the tranmission of codecs that
have similar requirements to audio?
What is meant by "start of speech", "end of speech", and endpointing in
general? How do transmission delays affect the definitions and what we
want in terms of APIs?
bodell: issue of latency impacting times
bjorn: agreed UA being basis for the clock
burn: dont have requirements for timing info from server
bodell: tts case?
bjorn: seems reasonable for server to include timing info
robert: could do offset from start
burn: something that UA can convert into UA local timestamp
... different ways to achieve that
... doesnt say what is made available in the api
bodell; many different times, when the utterance start etc
bodell: when received,
marc: impact on order that events are received
milan: will UA generate these events when using remote service
bodell: may assume energy detector gives you end of speech, before
reco gives end of speech, hard to guarantee order
milan: start of energy is different than start of speech
... hard to write web app if get two start of speech events
bodell: different events,
bodell; was fixed order for the non continuous case
charles: could arrange fixed order delivery, even if times inside do
not reflect this
bodell; no practical to hold events and put them in the desired
order
burn: energy detector gets end of sound, then will get actual end of
speech with better timing info, either get two or through away
better info
marc: dont want to override better info from remote service
burn: front is for optimization so dont have to send all the audio
bodell: events could be in different orders
... not convinced in having standard order
milan: UA only have sound start, sound end
... avoid duplication,
bodell; already have different event names
robert: in name need to make clear some events are from energy
... detector others are from speech reco
milan: source of events
bodell: unmake statement about specific ordering
milan: new statement that user agent can insert are energy related
events
marc: and probably capture start and end
charles: seems strong since speech service might or might not be
remote
burn: removed ordering
... energy detector can only generate sound start stop
burn; speech service can only deliver the speech start stop
charles; if not order can be guarantee delivery
burn: how to guarantee it '
milan: as long as have single source for events
michael: (need a blackboard for this)
bodell: solved by removing required ordering
... allows all the use cases
... also works with continuous case
... thought had solved the issue
burn: but start before end?
... can get end without having seen a start
milan: reluctant to give up the ordering, if have single source for
each type of event
burn: agreed speech service can only generate one, can't guarantee
that they wont cross in time
milan: use remote speech service as the canonical
bodell: easiest to understand cross for end, UA would raise both
events in the order they occurred
milan: it is possible to impose an ordering
... pros and cons, flexibility, or predictability for the web app
developer
bjorn: events from the same source should be in the same order
Received on Wednesday, 8 June 2011 00:09:35 UTC