W3C home > Mailing lists > Public > public-xg-htmlspeech@w3.org > May 2011

[minutes] updated for 5 May 2011

From: Dan Burnett <dburnett@voxeo.com>
Date: Thu, 12 May 2011 11:01:07 -0400
Message-Id: <D1ADB6E0-5753-44FE-ABAE-9DB137ACBBD0@voxeo.com>
To: public-xg-htmlspeech@w3.org
An updated html version of the minutes, with typos fixed, is available  
at
http://www.w3.org/2005/Incubator/htmlspeech/2011/05/05-htmlspeech-minutes.html

-- dan

On May 11, 2011, at 6:05 AM, Dan Burnett wrote:

> Group,
>
> The minutes are available at http://www.w3.org/2011/05/05-htmlspeech-minutes.html
>
> For convenience, a text version is below.
>
> Thanks to Charles Hemphill for taking minutes!
>
> -- dan
>
> Attendees
>
>   Present
>          Dan_Burnett, Michael_Bodell, Bjorn_Bringert, Robert_Brown,
>          Olli_Pettay, Charles_Hemphill, Patrick_Ehlen, Dan_Druta,
>          Michael_Johnston, Raj_Tumuluri
>
>   Regrets
>          Debbie_Dahl, Marc_Schroeder
>
>   Chair
>          Dan_Burnett
>
>   Scribe
>          Charles_Hemphill
>
> Contents
>
>     * [4]Topics
>         1. [5]F2F Logistics: Any updates on attendance, hotel
>            bookings, and questions or details from Bjorn.
>         2. [6]Review new text in updated "Final Report" document
>            [$1\47] to ensure it matches what people think we agreed
>            upon in our last teleconference.
>         3. [7]Determine if we already have other agreed-upon design
>            decisions.
>         4. [8]Begin discussing issues listed in the Appendix.
>     * [9]Summary of Action Items
>     _________________________________________________________
>
>   <burn> trackbot, start telcon
>
>   <trackbot> Date: 05 May 2011
>
>   <burn> Scribe: Charles_Hemphill
>
>   <burn> ScribeNick: Charles
>
>   <burn> Agenda:
>   [10]http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011May
>   /0001.html
>
>     [10] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011May/0001.html
>
> F2F Logistics: Any updates on attendance, hotel bookings, and  
> questions
> or details from Bjorn.
>
>   Bjorn: no updates on F2F
>
>   Burn: will send out schedule in the next few days.
>
> Review new text in updated "Final Report" document [$1\47] to ensure  
> it
> matches what people think we agreed upon in our last teleconference.
>
>   <burn> document is
>   [11]http://www.w3.org/2005/Incubator/htmlspeech/live/NOTE-htmlspeech
>   -20110503.html
>
>     [11] http://www.w3.org/2005/Incubator/htmlspeech/live/NOTE-htmlspeech-20110503.html
>
>   Burn: comments on the document - added general design decission - 17
>   new discussion bullets.
>
> Determine if we already have other agreed-upon design decisions.
>
>   Bjorn: discussion topic about mic capture access. Propose design
>   agreement - should be possible to start speech reco without
>   selecting mic - just pick default.
>
>   Burn: default vs. what you can do - two things.
>
>   Bjorn: There should be a default mic. Perhaps the only option.
>
>   Born: saying explicit determination of mic should not be required.
>
>   Bjorn: Should not need to enumerate mics before starting.
>
>   Robert: Think we let you mic other mics.
>   ... that's a reasonable interpretation.
>   ... By default, mic provided by user agent default device.
>
>   Bjorn: Need to discuss second sentence later - picking a mic.
>   ... should be able to start reco without selecting mic - confirming
>   agreement.
>
>   Robert: Assuming that the default will be used for mic.
>
>   Burn: notion of default mic.
>
>   Robert: Issue of user interface. Shows speaker activity. Is there a
>   default user interface? Can the application override.
>
>   Bjorn: Have that requirement for default user interface.
>
>   Robert: RE: default user interface - shows it's listening and lets
>   user cancel.
>
>   Olli: What is the default user interface. Something in the browser.
>
>   Bjorn: Should only user browser user interface. No Web app user
>   interface.
>
>   Olli: More security or privacy concerns otherwise.
>
>   DanD: worried about limitations of only in the browser.
>
>   Robert: Don't think that's true.
>   ... Default user interface. Can it be overridden. Where does it
>   live. 3 discussions.
>   ... Google right in the Web page where the user clicks. Up to user
>   agent to decide how to render.
>
>   Bjorn: Have a default interface now.
>
>   MichaelJ: Fine for default. Want APIs to allow someone to build
>   their own. Different user experience. Allow this. Useful to have
>   default. But now always appropriate.
>
>   Bjorn: Have agreement on default. Have disagreeemnt on your own due
>   to security reasons, etc.
>
>   MichaelJ: Very limiting otherwise.
>
>   Bjorn: Should start speech by custom ways including JavaScript. Can
>   hide that you're capturing audio if custom UI.
>
>   Robert: Compromise - default UI parameterized? Provide feedback to
>   the user. Style sheet. Look at customizations.
>
>   MichaelB: Up to user agent to allow customization. Part of
>   permissions API.
>
>   Burn: Should be a default user interface.
>   ... Should there be customization and what level.
>
>   DanD: Not all use cases in browsers. Different security concerns if
>   rendering engine used. Should not be forced by HTML spec to have a
>   particular UI.
>   ... Don't want to prevent annimated character app that is listening
>   to you.
>
>   Bjorn: Talk about browser case. Need to be clear tha the browser is
>   capturing the audio.
>
>   Dand: COuld be a matter of security settings.
>
>   Bjorn: Don't say that we disallow customization, but don't require
>   this.
>
>   DanD: End up with fragmentation. WOn't work cross browser.
>
>   Bjorn: Allow for non-browser apps.
>   ... Note for future discussion.
>   ... Allow customization of the user interface that show audio
>   capture is happening.
>
>   Burn: Have a discussion topic of the level of customization allowed.
>
>   Bjorn: SHould have customization for the UI for starting
>   recognition. Have discussion topic: customize UI for showing that
>   audio is being captured.
>
>   MichaelJ: Waveform, traffic lights?
>
>   Bjorn: Can app customize what the app looks like?
>
>   MichaelJ: Can customize one that show up in the UI.
>   ... Multimodal tap and talk API. Want creativity. Activate
>   recogntition button. DOn't want to rule out certain kinds of APIs.
>   Dont' want built-in browser feedback to interfere.
>
>   Burn: come back to this discussion later.
>
> Begin discussing issues listed in the Appendix.
>
>   Burn: Have time to discuss a serious topic. Can work out serious
>   issues at FTF.
>   ... Determine which topics have more meat. Start with audio.
>   ... 3 audio related topics. How to get audio capture access.
>   Manditory audio codecs. Audio streaming support and how.
>
>   Bjorn: 1st unrelated to 2nd two. 1st is API. 2nd two how audio is
>   sent form browser to implementation.
>
>   Burn: How to get audio mic capture access.
>
>   Bjorn: MS proposal has mic selection. What are use cases?
>
>   <burn> "audio mic capture" is "audio/mic/capture"
>
>   Robert: Browser going to have mic API anyway. Avoid 2 mic APIs. 1 in
>   speech and anothe unrelated (explicit). Want speech API to integrate
>   with browser API.
>   ... Many devices will have mult. mics. Improtant to select the one
>   you want. Maybe app or user through prefences.
>   ... May want to configure mic settings. Use for things other than
>   speech. E.g. video app that does speech reco.
>   ... MS API allows this. Can get audio strem to reco. Look at
>   multimodal scenarios. Need for integrated API there. Speech API
>   should integrate.
>
>   Bjorn: Can buy most of that.
>   ... If there is one there, should be able to use for speech. But no
>   such standard API yet.
>
>   Robert: Pushing capture API heavily. With michael. IE team thinks
>   this is a sound approach.
>
>   Burn: Agree ability to select diff. audio sources.
>
>   Robert: Not quite it. If browser has mic API - we should be able to
>   use it.
>
>   Bjorn: Agree. But if not one, don't want to come up with one
>   ourself.
>
>   Olli: agree.
>
>   Bjorn: If HTML standard has one, we should be able to use it.
>
>   Robert: Fine with HTML rather than browser.
>
>   Burn: Meta decision. Use HTML if exists, but not create one.
>
>   Robert: Have requirements for such an API?
>   ... Latest draft doesn't have notion of stream of endpointing. And
>   we care deaply about these for mic API.
>
>   Bjorn: Why does mic API need endpointing?
>
>   Robert: Can be a long way between mic and endpointer.
>
>   <burn> should "stream of endpointing" be "stream or endpointing"?
>
>   Bjorn: Requirement that endpointing be available for things other
>   than speech.
>
>   Michael: Hopefully, have agreement - will work with people designing
>   the API and express requirements.
>
>   Bjorn: Seems fair.
>
>   Olli: Capture API in HTML draft or draft working group.
>
>   Robert: Mean the one in the DAP working group.
>
>   Bjorn: Think we should work with HTML.
>
>   Burn: 2nd one tricky. Wrote we will capture an express requirement
>   on a capture API to relavent groups.
>
>   Bjorn: Seems reasonable. Avoid "capture".
>
>   Burn: requirements on audio capture APIs.
>   ... requirements on all audio capture APIs.
>
>   Bjorn: seems fine.
>
>   <mbodell> Olli, is there a capture API in the w3c HTML draft? I
>   don't see it at [12]http://dev.w3.org/html5/spec/Overview.html
>
>     [12] http://dev.w3.org/html5/spec/Overview.html
>
>   <smaug> mbodell: I don't read that version of HTML spec ;)
>
>   Bjorn: If no HTML audio capture API. Propose that we proceed even
>   without a mic API.
>
>   <smaug> mbodell:
>   [13]http://www.whatwg.org/specs/web-apps/current-work/multipage/dnd.
>   html#video-conferencing-and-peer-to-peer-communication is an early
>   draft
>
>     [13] http://www.whatwg.org/specs/web-apps/current-work/multipage/dnd.html#video-conferencing-and-peer-to-peer-communication
>
>   <burn> (now Robert is speaking)
>
>   Robert: Concern - browsers will need to implement privacy and
>   security policies. Weird to have for speech alone, but not audio
>   capture in general. May be messy.
>
>   Bjorn: Forge ahead, and consider audio capture in general.
>
>   Burn: Agreement that's important.
>
>   Bjorn: Having control over audio capture does not have to be in the
>   first proposal.
>
>   Burn: Is that the concencus?
>
>   Bjorn: OK to have speech API if there is not an audio capture API.
>
>   Robert: Not create one, and shouldn't be blocked from moving
>   forward.
>
>   Burn: Not create one and not block while waiting for one.
>
>   Michael: May design suboptimal if no audio capture API and may not
>   fit well once it's there.
>   ... Premature to jump to say we can make total progress without
>   that.
>
>   DanD: Goal for group to submit the requirements to the other working
>   groups. Accelarating the cature API for audio may be one of the
>   recommendations. AT&T member of DAP. Recognize needs.
>
>   Bjorn: Agree we should not block this progress while waiting.
>
>   DanD: May create fragmentation.
>   ... Unless abstracted completely to "get mic".
>
>   Bjorn: Agreed that we should start reco without specifying mic.
>
>   DanD: Concerned that we should avoid fragmentation.
>
>   Burn: Good to get agreement.
>
>   Dand: API for capture, if we are able to capture the audio without
>   web developer going through coding, then we are fine.
>   ... If anything specific in the web application to retrieve the
>   audio handle, then we're looking for if-then-else statements.
>
>   Bjorn: We would like to do the former.
>
>   Burn: What is meant by "start of speech", "end of speech", and
>   endpointing in general? How do transmission delays affect the
>   definitions and what we want in terms of APIs?
>
>   Robert: Divide into smaller topics. Distributed env., with speech
>   services remote. 2 notions of endpoiting: by reco or cheap on client
>   (responsiveness and reduced network IO). Look at these 2 as
>   seperate.
>
>   Bjorn: Throw out proposal. Require client-side simple endpointer?
>
>   Robert: Has my vote.
>
>   Burn: No endpointer on my computer.
>
>   Bjorn: Browser could do simple energy-based end pointing.
>
>   Robert: Lots of optinos. GSM encoder has endpointer. Can have local
>   reco and use for endpointer.
>
>   Burn: APi needs to assume client as well as server-side endpointer.
>   client could be null op?
>
>   Bjorn: Stronger: has to be something in the client that does tell
>   start and end of speech. even if not good.
>
>   Michael: Can see recommending. Don't know how web author can know.
>   requirement is low latency. doesn't matter after that.
>
>   Bjorn: Agree with that. But if app points to specific recognizer,
>   can interact.
>
>   Burn: Why concerned. Reco can get finicky about input based on
>   training. Endpointing is mostly done in advance. Be careful about
>   requiring local endpointing. If bad, can affect reco.
>
>   Bjorn: Avoid bad endpointers.
>   ... Low latency speech dectection should always be available.
>
>   MichaelJ: But not forced to use it. FedEx example: some query -
>   using endpointing from reco - want them to be able to use the
>   standard. Client endpointing could cause errors.
>
>   Bjorn: Have some parameters. Make it easier for the app. Think
>   you're speaking.
>
>   Burn: Ongoing recognition case - won't use loca endpointer.
>   ... plenty of open mic apps - listen for keywords.
>
>   Bjorn: Should be one, but should be possible for app to turn off.
>
>   Robertt: probably want app to turn it on if it needs it.
>
>   Michael: Set a parameter and get it that way.
>
>   Bjorn: Hello world app.
>
>   Charles: Level for feedback - good to be local.
>
>   Burn: Low latency endpoint detector shoudl be available.
>
>   Bjorn: Don't have agreementn if on or off by default.
>
>   MichaelJ: Talking about detection of end of speech or start too?
>
>   Burn: may be big difference.
>   ... Want low latency to turn on speech to reco - but don't want it
>   to stop.
>
>   Bjorn: we do the opposite.
>   ... Start streaming right away, server endpoints, but need to stop
>   streaming at some point.
>
>   Robert: very scenario dependent. Need start stop speech event. Start
>   when click of button, end matters a lot. Need to have optinos
>   available.
>
>   Burn: Forwarding audio to expensive recognizers. Want high accuracy
>   on end pointing. Don't want to send audio unless we have to due to
>   expense.
>
>   Bjorn: Cutting off audio vs. endpointer. Can not listen for the
>   event. Control if endpointing cuts off audio.
>
>   MichaelJ: Need to control when start sending audio to recognizer.
>
>   Burn: Start speech adn reco can be different.
>
>   MichaelJ: If reco on for a long time, may want to do something do
>   delay until there is certainty of speech.
>
>   Bjorn: Agree tha there is low latency endpointer is available.
>   Should be possible for app to decide if audio is started of stopped
>   on endpointer.
>
>   Burn: Audio start /stop separate from speech start/stop. Seperatly
>   controllable.
>   ... Detector detects both start/end of speech and fires an event in
>   each case.
>
>   Bjorn: Seperate issue of cutting off audio.
>
>   Burn: Audio to the reco process as opposed to TTS.
>   ... Audio start and stop to reco server (resource)...
>
>   Bjorn: Control over which audio is used for speech recognition.
>   ... which part of the captured audio.
>
>   DanD: Make sure we carefully agree that we are not forcing the
>   application into using the predefined environment engine of the
>   browser and still allow developer which engine to use.
>
>   DadD: have a flag. If use optimzied endpointing in application of
>   not.
>
>   Bjorn: Seperate from how you choose the engine.
>
>   MichaelJ: Related - if turned on, give some sort of event for local
>   prediction of begin/end of speech, is that the resolution we want?
>   If level dectector, can also get level?
>
>   Bjorn: Ahould be a more precise way to get actual events from
>   recognizer. Level part of mic API?
>
>   MichaelJ: Could be raw energy detector, limited reco listing for
>   "silence", etc. for the local part. The browser, client side, can
>   have best that it can. Not saying anything about how it's done.
>
>   Burn: May be a difference when there are multiple endpointers. (1)
>   low latency - prefilter to decide if goes to reco, (2) high quality
>   in engine.
>   ... Would want recognizers endpoint detector. But preprocess one is
>   the low latency one.
>
>   Bjorn: 2 event : 1 probably vs. actual start/end of speech.
>
>   MichaelJ: Talking now vs. not. More going on underneath. Get
>   complicated to expose underneath if varies by implementation. Energy
>   level might drive aspects of the API.
>
>   Burn: Why want distinction? Mic open is one option. ANother is that
>   engine is paying attention. ANother is that engine found something
>   importatnt.
>   ... Might decide that it's not hearing anything.
>
>   Bjorn: Started capture, think starting, actually starting. 1st 2 go
>   in the UI. Good to have last for timing.
>
>   Burn: In VXML2, have hot word detection. Concluded it doesn't act as
>   if speech is detected untell something happens. Acts as if nothing
>   happend if nothing reco'd. May collapse 2nd and 3rd states.
>
>   Bjorn: Thought we had agreement earlier.
>
>   Burn: Agreed we had some sort of start and end. Knew we needed to
>   discuss it.
>
>   Bjorn: 3.3.3. - onspeechstart/end/error. Need to add more to this
>   list.
>   ... propose adding onaudiostart onaudioend, and split onspeechstart
>   to detected vs. actual (reco).
>
>   MichaelJ: energy vs. reco? split
>
>   Bjorn: Could be confusing.
>
>   MichaelB: onsoundstart?
>
>   Bjorn: sounds like a good name.
>
>   MichaelJ: Issues of calibration? Sensitivity parameters? Used on
>   mobile phones or elsewhere. Might need calibration to work well.
>
>   ???: Sensitivity and timeout parameters.
>
>   Burn: Whole topic to discuss parameters.
>
>   Bjorn: Discuss parameters in context.
>   ... Agree on adding these events?
>
>   Burn: We will add onaudiostart/end ... Dan will cut and paste here?
>
>   Bjorn: onsoudstart/end shold be low latency. Also say somehting
>   about order.
>
>   Burn: OK. audiostart, soundstart, speechstart, speechend, soundend,
>   audioend
>
>   Bjron: Might not get soundstart or speechstart.
>
>   Burn: onsoundstart, require soundend.
>   ... soundend optional?
>
>   Bjorn: not true.
>   ... Can't have onspeechstart without the preceeding two.
>
>   Charles: Want end events with start events.
>
>   Burn: Can have ends all at the same time.
>
>   Bjorn: what if onerror?
>
>   Burn: Great topic.
>   ... capture that as issue for discussion.
>   ... what happens to audiosound and speech events in case of error.
>
>   Bjorn: Also sensitivity discussion point. And timeout parameters for
>   ASR.
>
>   Burn: Meeting next week. Can have call after that. Meeting after
>   that. 2 days of meeting.
>
>
>
Received on Thursday, 12 May 2011 15:19:05 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 12 May 2011 15:19:05 GMT