Hypertext Coordination Group Minutes, January 14, 2011 from Deborah Dahl on 2011-01-14 (public-hypertext-cg@w3.org from January to March 2011)

From: Deborah Dahl <dahl@conversational-technologies.com>
Date: Fri, 14 Jan 2011 11:16:35 -0500
To: <public-hypertext-cg@w3.org>
Cc: <w3c-html-cg@w3.org>, "'Michael Bodell'" <mbodell@microsoft.com>, "'Dave Burke'" <daveburke@google.com>
Message-ID: <00c801cbb406$6ac96ca0$405c45e0$@conversational-technologies.com>
Minutes from the discussion of Audio on the Web
http://www.w3.org/2011/01/14-hcg-minutes.html

and below as text.

Note that there are some member-only links, but it should
be possible to follow the discussion from the minutes.

   [1]W3C

      [1] http://www.w3.org/

                               - DRAFT -

              Hypertext Coordination Group Teleconference
                              14 Jan 2011

   See also: [2]IRC log

      [2] http://www.w3.org/2011/01/14-hcg-irc


Attendees

   Present
          Michael_Bodell, Dan_Burnett, Janina, ChrisL, Art_Barstow,
          Steven, Kaz, Debbie_Dahl, glazou, darobin, Shepazu, Bert

   Regrets
          Cameron_McCormack, Erik_Dahlstr�m, Art_Barstow,
          Charles_McCathieNevile, Frederick_Hirsch, Lofton_Henderson

   Chair
          Chris

   Scribe
          ddahl

Contents

     * [3]Topics
         1. [4]Audio on the Web
     * [5]Summary of Action Items
     _________________________________________________________

   <trackbot> Date: 14 January 2011

   <glazou> "this passcode is not valid"

   <glazou> ah finally

   <ChrisL> glazou, i used 4824 and it worked

   <glazou> ah that was the french bridge

   <glazou> the US one is in better shape

   <glazou> good start for the new year darobin

   <ddahl1> scribe: ddahl

   <ddahl1> chair: ChrisL

Audio on the Web

   <ChrisL>
   [6]http://lists.w3.org/Archives/Member/w3c-html-cg/2011JanMar/0005.h
   tml

      [6] http://lists.w3.org/Archives/Member/w3c-html-cg/2011JanMar/0005.html

   <ddahl1> michaelB: in MMI, VoiceBrowser, and HTML SpeechXG. wants to
   capture audio in a way that user can interact with it. some
   proposals have capture and then upload, but that doesn't satisfy our
   use case.

   <ddahl1> chris: other requirements for speech?

   <ddahl1> michaelB: yes, endpointing, echo cancellation, playback for
   speech synthesis, and tying playback to barge-in

   <glazou> Bert: the french bridge...

   <ddahl1> the French bridge is hosed, call in on the US one if you
   can

   <ChrisL>
   [7]http://lists.w3.org/Archives/Member/w3c-html-cg/2011JanMar/0003.h
   tml

      [7] http://lists.w3.org/Archives/Member/w3c-html-cg/2011JanMar/0003.html

   <ddahl1> janina: our requirements came from making HTML 5 video and
   audio accessible

   <ddahl1> ...video description uses secondary audio channel, used in
   broadcasting, different on the web, also looking for a way to play
   two binary resources, not necessarily the same length

   <ddahl1> ...another one is the need to control volume and panning
   separately, or direct them to a secondary audio device

   <ddahl1> chrisL: if something's being broadcast to a group, then
   different people might have different needs

   <ddahl1> doug: how would that work?

   <ddahl1> ...are you accessing different devices, how would you
   discover different devices?

   <ddahl1> janina: I don't know if the browser knows, but the OS
   knows. I know it's discoverable on Linux, mapping the OS resources
   to the browser

   <ddahl1> chrisL: a kind of labeling so that different things go to
   different devices

   <ddahl1> ...also synchronization of multiple audio streams, SMIL
   does this, but not HTML5 audio

   <ddahl1> janina: HTML5 seems to assume that files have the same
   timespan, but that might not be true for video description or for
   different languages

   <ddahl1> chrisL: especially problematic for longer files

   <ddahl1> janina: SMIL seems to work well, is used in Daisy
   Consortium

   <ddahl1> ...we could take as much of SMIL for the use cases we need
   and leave the rest behind

   <ddahl1> chrisL: similar to what we did with SVG

   <ddahl1> doug: Audio XG -- audio api is an api for reading and
   writing to the live audio stream, one implementation that will be in
   Firefox, we just give access to the raw bits, a more sophisticated
   implementation in WebKit, also has a higher-level ability to
   manipulate audio in the browser

   <ddahl1> ...we will make a WG, have been mostly talking about WebKit
   approach, should things be done in the browser or with script
   libraries

   <ddahl1> chrisL: is script fast enough for helper methods?

   <ddahl1> doug: I don't know, would be better to use helper methods
   in mobile devices because of processing constraints

   <ChrisL>
   [8]http://lists.w3.org/Archives/Member/w3c-html-cg/2011JanMar/0004.h
   tml

      [8] http://lists.w3.org/Archives/Member/w3c-html-cg/2011JanMar/0004.html

   <inserted> scribenick: ChrisL

   ddahl: Our primary use case involving audio is input and output of
   speech, mainly for interaction
   ... but also recording, like fro voicemail. so need to capture
   speech and to stream it
   ... not just batch capture
   ... support arbitrary processing - speech

   recognition, speech understanding, speech-to-speech translation,
   emotion

   detection, speaker verification, language/gender/age identification,
   medical

   diagnosis

   ddahl: will not support arbitrary translation
   ... need to contreol format and sampling rate
   ... capture speech on mobile or desktop or over telephone (last is a
   VB requirement)
   ... Able to combine semantics of speech with other inputs, like
   circling an area

   and saying "Italian restaurants near here"

   ddahl: control volume of output, pause and resume
   ... local or distributed cloud-based processing
   ... audio file output, tts, positioningof inputsand outputs
   ... multiple microphones? like a big meeting room and record the
   whole meeting

   ChrisL: multichannel or mixing?

   ddahl: both
   ... no use cases around capturing non-speech audio, for mm, but
   importtant for others

   ChrisL: ability to determine if an audio input is speech or
   non-speech

   <kaz> scribenick: ddahl

   <inserted> scribenick: ddahl1

   michaelB: also have concerns around security and privacy
   ... a microphone is like a keyboard, what are user expectations and
   behavior
   ... need to mix with functional requirements

   chrisL: you can imagine some way of notifying the user that speech
   is being recorded.

   janina: in the news today was a story about spyware on smartphones

   michael: also need to be able to notify user in non-visual
   environments

   doug: maybe a vibratory signal could signal when microphone is on
   ... nothing about privacy in the charter, but the spec will mention
   privacy
   ... charter basically has microphone access. lots of discussion
   about access to microphone. DAP WG is chartered to do it but hasn't
   done it. Audio WG will work on it if necessary

   <darobin> DAP is doing something about this

   <darobin> RTC will help as well

   <darobin> more than happy to work with Audio

   chrisL: comments from robin on DAP?

   <darobin> and in fact we've done it

   <darobin> just not at the level required yet

   <darobin> but certainly can push further

   <darobin> very basic access: [9]http://dev.w3.org/2009/dap/camera/

      [9] http://dev.w3.org/2009/dap/camera/

   <darobin> more advanced:
   [10]http://dev.w3.org/2009/dap/camera/Overview-API.html

     [10] http://dev.w3.org/2009/dap/camera/Overview-API.html

   <darobin> and we want to do more advanced still, but will need some
   security model for it

   <darobin> RTC == real time Web

   <darobin> it's not called camera, URIs are opaque dammit :)

   chrisL: "camera" spec sounds like it should be visual

   <darobin> [11]http://www.w3.org/TR/media-capture-api/

     [11] http://www.w3.org/TR/media-capture-api/

   <darobin> [12]http://www.w3.org/TR/html-media-capture/

     [12] http://www.w3.org/TR/html-media-capture/

   <darobin> (same links, for people who read URIs)

   <darobin> that is correct

   michaelB: doesn't cover the streaming case for audio

   <darobin> we're working on that, but it's harder security wise

   <darobin> we're also synching with HTML WG

   chrisL: separate specs, capture vs. streaming?

   <darobin> yes, they build atop one another whenever possible

   michaelB: maybe could be separate, but could be the same spec.
   working on proposals in HTML-SpeechXG, reviewing proposals

   <darobin> feeeeeeeeeeeedback

   <darobin> we wantsssss feeeeeeeeeeeeeeeeeeeedback

   <darobin> I may not be able to speak today, but I can read :)

   chrisL: HTML-speech XG should send email to DAP

   <darobin> DAP: public-device-apis@w3.org

   <ArtB> Web Audio API from Chris Rogers:
   [13]http://chromium.googlecode.com/svn/trunk/samples/audio/specifica
   tion/specification.html

     [13] http://chromium.googlecode.com/svn/trunk/samples/audio/specification/specification.html

Summary of Action Items

   [End of minutes]
     _________________________________________________________
Received on Friday, 14 January 2011 16:17:15 UTC