- From: Dan Burnett <dburnett@voxeo.com>
- Date: Tue, 7 Jun 2011 20:08:55 -0400
- To: public-xg-htmlspeech@w3.org
Group, The minutes from last week's call are available at http://www.w3.org/2011/06/02-htmlspeech-minutes.html . For convenience, a text version is embedded below. Thanks to Michael Johnston for taking the minutes! -- dan **************************************************************************************************** Attendees Present Dan_Burnett, Milan_Young, Marc_Schroeder, Robert_Brown, Patrick_Ehlen, Charles_Hemphill, Satish_Sampath, Glen_Shires, Michael_Johnston, Olli_Pettay, Michael_Bodell, Bjorn_Bringert, Dan_Druta, Debbie_Dahl, Raj_Tumuluri Regrets Chair Dan Burnett Scribe Michael_Johnston Contents * [4]Topics 1. [5]updated final report document 2. [6]agreed upon design decisions 3. [7]additional issues to add to list of issues 4. [8]markup binding 5. [9]crucial decisions partially discussed 6. [10]do we support audio streaming and how? 7. [11]What is meant by "start of speech", "end of speech", and endpointing in general? How do transmission delays affect the definitions and what we want in terms of APIs? * [12]Summary of Action Items _________________________________________________________ <burn_> Agenda: [13]http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Jun /0006.html [13] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Jun/0006.html burn: start with review of face to face minutes, will review again next week ... comments on minutes updated final report document burn: comments on draft at this point? all: silence agreed upon design decisions additional issues to add to list of issues michael: does move to have emma document in dom, remove impetus for json variant of emma bjorn: have simple javascript api for accessing most common elements, dont need json variant of emma, for details can access emma object milan: need to do xml parsing? bodell: will be much the same as other http requests that return xml, dont need to parse milan: are mobile devices a problem, verbosity of xml bodell: no dan: is there pressure from this group to build a json version of emma? all: agreement: no push for json version of emma burn: any other issues to add to list for discussion markup binding bjorn: no feedback from chrome team yet bodell: keep html binding lightweight, js constructor, simple "for" mechanism, small work to define, if dont want then remove the element ... should not mess up js api olli: problem with for attribute it what it can point to, what elements can be used as target, doesnt quite work with content editable, important use case ... clarifies issue, need to make clear which elements can be targets and what the semantics is ... also content editable areas michael: have to define semantics when target is e.g. a drop down or radio button olli: may be new kinds of elements also bodell: assumption would be to bind to any element, but they would not all have to work, bodell; some browsers would want to handle more input types olli: reco would be element in the dom, what is the benefit of the reco ... if for is not used bodell: google desire to have element with microphone click api bjorn: have proposed several things along the way, ... most important aspect is to have an element you can click to start speaking without the pop up or info bar olli: no clear what the element gives robert: follow up with chrome folks bjorn: still waiting on that ... do agree that html element discussion does not block the js api discussion olli: issue may get solved along the way burn: need to see concrete proposal to make decision crucial decisions partially discussed <marc> [14]http://www.w3.org/2005/Incubator/htmlspeech/2011/05/f2fminutes20 1105.html [14] http://www.w3.org/2005/Incubator/htmlspeech/2011/05/f2fminutes201105.html burn: will go through each ... bjorn: audio capture topic is dealt with, should be default way, if there is an audio capture api will deal with then burn: audio codecs mandatory robert: even IP status around speex is unclear also ... are only reasonable answers pcm and mulaw, despite their flaws bjorn: flac, high bandwidth <bringert> FLAC <bringert> [15]http://en.wikipedia.org/wiki/Free_Lossless_Audio_Codec [15] http://en.wikipedia.org/wiki/Free_Lossless_Audio_Codec milan: speex is in ietf draft on how to package in rtp <Milan> [16]http://tools.ietf.org/html/draft-ietf-avt-rtp-speex-07 [16] http://tools.ietf.org/html/draft-ietf-avt-rtp-speex-07 burn: rtp way to send it does not mean there are ip issues bjorn: need to require some codecs or cant be interoperable milan: problems sounds similar bodell: rfc does not have a patent policy burn: if something is necessary to implement the spec, and it is encumbered with IP, need to make that clear bjorn: need protocol for interoperability milan: protocol for RTC burn: opus, codecs from two organizations, trying to blend, not clear if IP issues are being resolved, making container ... can use either one if you have permission ... dont have an answer yet, really need one, industry wide problem, may not be ours to solve, return to this <mbodell> See [17]http://lists.xiph.org/pipermail/speex-dev/2003-November/000753.h tml for some similar discussion on patent of speex [17] http://lists.xiph.org/pipermail/speex-dev/2003-November/000753.html robert: will follow up re: speex again milan: impact on protocol team if need to negotiate codec marc: speex is not good enough for tts <marc> ogg vorbis <marc> and FLAC burn: few names as candidates flac, ogg vorbis, speex, pcm bjorn: already use flac in launched clients olli: ogg vorbis is core html audio <smaug> not core HTML audio. Some browsers just happen to support it burn: candidates to consider flac, ogg vorbis, speex, pcm do we support audio streaming and how? burn: think we expect streaming, less clarity on how milan: sending audio on regular time intervals as it is collected or generated bjorn: discussed how to get events while capturing ... how it is done is a protocol question burn: asr may begin before the user is ... finished speaking, result before engine comes milan: without regular timed packets, wont get events on regular interval bjorn: latency is what is app observable bodell: having multiple events is not a big problem ... data in events can deal with timing milan: if app is realtime, five seconds ago go this event bjorn; agree, what we need is low latency, not sure what we can require, part of being a good implementation burn: market takes care of product requirements robert: fair to say that standard should not have inherent limitations ... 50 ms or so is the threshold bjorn: protocol design should not make it impossible to achieve low latency event delivery marc: audio streaming in the tts case? ... send audio while still rendering rest of an long utterance bodell: tts is generally fast enough that this is not a problem marc: if tts has to process all text before returning audio, could be a problem, ... wants to make sure that what we create here does not prevent an implementation doing this bjorn: up to engine whether it starts to synthesize marc: wav format, header has filesize, makes proper streaming bjorn: protocol should make it possible for the tts to be streamed and start playing before ... synthesis is complete burn: issue of supporting format coming back in video and ... and playing the audio bjorn: should not require playing audio from video robert: api should not prevent this burn: video with three audio tracks, how does apis select robert: our proposal separated capture api from reco, could support different kinds of capture burn: protocol design should not preclude streaming of video codecs raj: why specify video? robert: if codec can be packetized in real time should be ok burn: the protocol should not inhibit the tranmission of codecs that have similar requirements to audio? What is meant by "start of speech", "end of speech", and endpointing in general? How do transmission delays affect the definitions and what we want in terms of APIs? bodell: issue of latency impacting times bjorn: agreed UA being basis for the clock burn: dont have requirements for timing info from server bodell: tts case? bjorn: seems reasonable for server to include timing info robert: could do offset from start burn: something that UA can convert into UA local timestamp ... different ways to achieve that ... doesnt say what is made available in the api bodell; many different times, when the utterance start etc bodell: when received, marc: impact on order that events are received milan: will UA generate these events when using remote service bodell: may assume energy detector gives you end of speech, before reco gives end of speech, hard to guarantee order milan: start of energy is different than start of speech ... hard to write web app if get two start of speech events bodell: different events, bodell; was fixed order for the non continuous case charles: could arrange fixed order delivery, even if times inside do not reflect this bodell; no practical to hold events and put them in the desired order burn: energy detector gets end of sound, then will get actual end of speech with better timing info, either get two or through away better info marc: dont want to override better info from remote service burn: front is for optimization so dont have to send all the audio bodell: events could be in different orders ... not convinced in having standard order milan: UA only have sound start, sound end ... avoid duplication, bodell; already have different event names robert: in name need to make clear some events are from energy ... detector others are from speech reco milan: source of events bodell: unmake statement about specific ordering milan: new statement that user agent can insert are energy related events marc: and probably capture start and end charles: seems strong since speech service might or might not be remote burn: removed ordering ... energy detector can only generate sound start stop burn; speech service can only deliver the speech start stop charles; if not order can be guarantee delivery burn: how to guarantee it ' milan: as long as have single source for events michael: (need a blackboard for this) bodell: solved by removing required ordering ... allows all the use cases ... also works with continuous case ... thought had solved the issue burn: but start before end? ... can get end without having seen a start milan: reluctant to give up the ordering, if have single source for each type of event burn: agreed speech service can only generate one, can't guarantee that they wont cross in time milan: use remote speech service as the canonical bodell: easiest to understand cross for end, UA would raise both events in the order they occurred milan: it is possible to impose an ordering ... pros and cons, flexibility, or predictability for the web app developer bjorn: events from the same source should be in the same order
Received on Wednesday, 8 June 2011 00:09:35 UTC