- From: Deborah Dahl <dahl@conversational-technologies.com>
- Date: Fri, 14 Jan 2011 11:16:35 -0500
- To: <public-hypertext-cg@w3.org>
- Cc: <w3c-html-cg@w3.org>, "'Michael Bodell'" <mbodell@microsoft.com>, "'Dave Burke'" <daveburke@google.com>
Minutes from the discussion of Audio on the Web
http://www.w3.org/2011/01/14-hcg-minutes.html
and below as text.
Note that there are some member-only links, but it should
be possible to follow the discussion from the minutes.
[1]W3C
[1] http://www.w3.org/
- DRAFT -
Hypertext Coordination Group Teleconference
14 Jan 2011
See also: [2]IRC log
[2] http://www.w3.org/2011/01/14-hcg-irc
Attendees
Present
Michael_Bodell, Dan_Burnett, Janina, ChrisL, Art_Barstow,
Steven, Kaz, Debbie_Dahl, glazou, darobin, Shepazu, Bert
Regrets
Cameron_McCormack, Erik_Dahlstr�m, Art_Barstow,
Charles_McCathieNevile, Frederick_Hirsch, Lofton_Henderson
Chair
Chris
Scribe
ddahl
Contents
* [3]Topics
1. [4]Audio on the Web
* [5]Summary of Action Items
_________________________________________________________
<trackbot> Date: 14 January 2011
<glazou> "this passcode is not valid"
<glazou> ah finally
<ChrisL> glazou, i used 4824 and it worked
<glazou> ah that was the french bridge
<glazou> the US one is in better shape
<glazou> good start for the new year darobin
<ddahl1> scribe: ddahl
<ddahl1> chair: ChrisL
Audio on the Web
<ChrisL>
[6]http://lists.w3.org/Archives/Member/w3c-html-cg/2011JanMar/0005.h
tml
[6] http://lists.w3.org/Archives/Member/w3c-html-cg/2011JanMar/0005.html
<ddahl1> michaelB: in MMI, VoiceBrowser, and HTML SpeechXG. wants to
capture audio in a way that user can interact with it. some
proposals have capture and then upload, but that doesn't satisfy our
use case.
<ddahl1> chris: other requirements for speech?
<ddahl1> michaelB: yes, endpointing, echo cancellation, playback for
speech synthesis, and tying playback to barge-in
<glazou> Bert: the french bridge...
<ddahl1> the French bridge is hosed, call in on the US one if you
can
<ChrisL>
[7]http://lists.w3.org/Archives/Member/w3c-html-cg/2011JanMar/0003.h
tml
[7] http://lists.w3.org/Archives/Member/w3c-html-cg/2011JanMar/0003.html
<ddahl1> janina: our requirements came from making HTML 5 video and
audio accessible
<ddahl1> ...video description uses secondary audio channel, used in
broadcasting, different on the web, also looking for a way to play
two binary resources, not necessarily the same length
<ddahl1> ...another one is the need to control volume and panning
separately, or direct them to a secondary audio device
<ddahl1> chrisL: if something's being broadcast to a group, then
different people might have different needs
<ddahl1> doug: how would that work?
<ddahl1> ...are you accessing different devices, how would you
discover different devices?
<ddahl1> janina: I don't know if the browser knows, but the OS
knows. I know it's discoverable on Linux, mapping the OS resources
to the browser
<ddahl1> chrisL: a kind of labeling so that different things go to
different devices
<ddahl1> ...also synchronization of multiple audio streams, SMIL
does this, but not HTML5 audio
<ddahl1> janina: HTML5 seems to assume that files have the same
timespan, but that might not be true for video description or for
different languages
<ddahl1> chrisL: especially problematic for longer files
<ddahl1> janina: SMIL seems to work well, is used in Daisy
Consortium
<ddahl1> ...we could take as much of SMIL for the use cases we need
and leave the rest behind
<ddahl1> chrisL: similar to what we did with SVG
<ddahl1> doug: Audio XG -- audio api is an api for reading and
writing to the live audio stream, one implementation that will be in
Firefox, we just give access to the raw bits, a more sophisticated
implementation in WebKit, also has a higher-level ability to
manipulate audio in the browser
<ddahl1> ...we will make a WG, have been mostly talking about WebKit
approach, should things be done in the browser or with script
libraries
<ddahl1> chrisL: is script fast enough for helper methods?
<ddahl1> doug: I don't know, would be better to use helper methods
in mobile devices because of processing constraints
<ChrisL>
[8]http://lists.w3.org/Archives/Member/w3c-html-cg/2011JanMar/0004.h
tml
[8] http://lists.w3.org/Archives/Member/w3c-html-cg/2011JanMar/0004.html
<inserted> scribenick: ChrisL
ddahl: Our primary use case involving audio is input and output of
speech, mainly for interaction
... but also recording, like fro voicemail. so need to capture
speech and to stream it
... not just batch capture
... support arbitrary processing - speech
recognition, speech understanding, speech-to-speech translation,
emotion
detection, speaker verification, language/gender/age identification,
medical
diagnosis
ddahl: will not support arbitrary translation
... need to contreol format and sampling rate
... capture speech on mobile or desktop or over telephone (last is a
VB requirement)
... Able to combine semantics of speech with other inputs, like
circling an area
and saying "Italian restaurants near here"
ddahl: control volume of output, pause and resume
... local or distributed cloud-based processing
... audio file output, tts, positioningof inputsand outputs
... multiple microphones? like a big meeting room and record the
whole meeting
ChrisL: multichannel or mixing?
ddahl: both
... no use cases around capturing non-speech audio, for mm, but
importtant for others
ChrisL: ability to determine if an audio input is speech or
non-speech
<kaz> scribenick: ddahl
<inserted> scribenick: ddahl1
michaelB: also have concerns around security and privacy
... a microphone is like a keyboard, what are user expectations and
behavior
... need to mix with functional requirements
chrisL: you can imagine some way of notifying the user that speech
is being recorded.
janina: in the news today was a story about spyware on smartphones
michael: also need to be able to notify user in non-visual
environments
doug: maybe a vibratory signal could signal when microphone is on
... nothing about privacy in the charter, but the spec will mention
privacy
... charter basically has microphone access. lots of discussion
about access to microphone. DAP WG is chartered to do it but hasn't
done it. Audio WG will work on it if necessary
<darobin> DAP is doing something about this
<darobin> RTC will help as well
<darobin> more than happy to work with Audio
chrisL: comments from robin on DAP?
<darobin> and in fact we've done it
<darobin> just not at the level required yet
<darobin> but certainly can push further
<darobin> very basic access: [9]http://dev.w3.org/2009/dap/camera/
[9] http://dev.w3.org/2009/dap/camera/
<darobin> more advanced:
[10]http://dev.w3.org/2009/dap/camera/Overview-API.html
[10] http://dev.w3.org/2009/dap/camera/Overview-API.html
<darobin> and we want to do more advanced still, but will need some
security model for it
<darobin> RTC == real time Web
<darobin> it's not called camera, URIs are opaque dammit :)
chrisL: "camera" spec sounds like it should be visual
<darobin> [11]http://www.w3.org/TR/media-capture-api/
[11] http://www.w3.org/TR/media-capture-api/
<darobin> [12]http://www.w3.org/TR/html-media-capture/
[12] http://www.w3.org/TR/html-media-capture/
<darobin> (same links, for people who read URIs)
<darobin> that is correct
michaelB: doesn't cover the streaming case for audio
<darobin> we're working on that, but it's harder security wise
<darobin> we're also synching with HTML WG
chrisL: separate specs, capture vs. streaming?
<darobin> yes, they build atop one another whenever possible
michaelB: maybe could be separate, but could be the same spec.
working on proposals in HTML-SpeechXG, reviewing proposals
<darobin> feeeeeeeeeeeedback
<darobin> we wantsssss feeeeeeeeeeeeeeeeeeeedback
<darobin> I may not be able to speak today, but I can read :)
chrisL: HTML-speech XG should send email to DAP
<darobin> DAP: public-device-apis@w3.org
<ArtB> Web Audio API from Chris Rogers:
[13]http://chromium.googlecode.com/svn/trunk/samples/audio/specifica
tion/specification.html
[13] http://chromium.googlecode.com/svn/trunk/samples/audio/specification/specification.html
Summary of Action Items
[End of minutes]
_________________________________________________________
Received on Friday, 14 January 2011 16:17:15 UTC