Re: Access to live/raw audio and video stream data from both local and remote sources


I'm sorry for the late answer. The W3C DAP and WebRTC chairs have 
discussed this, and come to the following:

- The WebRTC WG deals with access to live (audio and video) streams, and 
also currently have support for local recording of them in the API 
proposal [1].

- DAP has a note about the <device> element in the HTML Media Capture 
draft, but the <device> element has been replaced by "getUserMedia" [1].

- In the WebRTC charter there are references to DAP regarding device 
exploration and media capturing as that was deemed as in DAP scope at 
the time of writing the WebRTC charter. This has however since been 
resolved, for media streams this will be handled by WebRTC.

- WebRTC is planning coordination with the Audio WG to ensure alignment 
regarding media streams.

A question: what do you mean by "raw" audio and video stream data? The 
MediaStreams discussed in WebRTC are more of logical references (which 
you can attach to audio/video elements for rendering, to a 
PeerConnection for streaming to a peer and so on).

Stefan (for the DAP and WebRTC chairs).


On 2011-07-27 02:56, Rob Manson wrote:
> Hi,
> sorry for posting across multiple groups, but I hope you'll see from my
> comments below that this is really needed.
> This is definitely not intended as criticism of any of the work going
> on.  It's intended as constructive feedback that hopefully provides
> clarification on a key use case and it's supporting requirements.
>          "Access to live/raw audio and video stream data from both local
>          and remote sources in a consistent way"
> I've spent quite a bit of time trying to follow a clear thread of
> requirements/solutions that provide API access to raw stream data (e.g.
> audio, video, etc.).  But I'm a bit concerned this is falling in the gap
> between the DAP and RTC WGs.  If this is not the case then please point
> me to the relevant docs and I'll happily get back in my box 8)
> Here's how the thread seems to flow at the moment based on public
> documents.
> On the DAP page [1] the mission states:
>          "the Device APIs and Policy Working Group is to create
>          client-side APIs that enable the development of Web Applications
>          and Web Widgets that interact with devices services such as
>          Calendar, Contacts, Camera, etc"
> So it seems clear that this is the place to start.  Further down that
> page the "HTML Media Capture" and "Media Capture" APIs are listed.
> HTML Media Capture (camera/microphone interactions through HTML forms)
> initially seems like a good candidate, however the intro in the latest
> PWD [2] clearly states:
>          "Providing streaming access to these capabilities is outside of
>          the scope of this specification."
> Followed by a NOTE that states:
>          "The Working Group is investigating the opportunity to specify
>          streaming access via the proposed<device>  element."
> The link on the "proposed<device>  element" [3] links to a "no longer
> maintained" document that then redirects to the top level of the whatwg
> "current work" page [4].  On that page the most relevant link is the
> video conferencing and peer-to-peer communication section [5].  More
> about that further below.
> So back to the DAP page to follow explore the other Media Capture API
> (programmatic access to camera/microphone) [1] and it's latest PWD [6].
> The abstract states:
>          "This specification defines an Application Programming Interface
>          (API) that provides access to the audio, image and video capture
>          capabilities of the device."
> And the introduction states:
>          "The Capture API defines a high-level interface for accessing
>          the microphone and camera of a hosting device. It completes the
>          HTML Form Based Media Capturing specification [HTMLMEDIACAPTURE]
>          with a programmatic access to start a parametrized capture
>          process."
> So it seems clear that this is not related to streams in any way either.
> The Notes column for this API on the DAP page [1] also states:
>          "Programmatic API that completes the form based approach
>          Need to check if still interest in this
>          How does it relate with the Web RTC Working Group?"
> Is there an updated position on this?
> So if you then head over to the WebRTC WG's charter [7] it states:
>          " define client-side APIs to enable Real-Time
>          Communications in Web browsers.
>          These APIs should enable building applications that can be run
>          inside a browser, requiring no extra downloads or plugins, that
>          allow communication between parties using audio, video and
>          supplementary real-time communication, without having to use
>          intervening servers..."
> So this is clearly focused upon peer-to-peer communication "between"
> systems and the stream related access is naturally just treated as an
> ancillary requirement.  The scope section then states:
>          "Enabling real-time communications between Web browsers require
>          the following client-side technologies to be available:
>          - API functions to explore device capabilities, e.g. camera,
>          microphone, speakers (currently in scope for the Device APIs&
>          Policy Working Group)
>          - API functions to capture media from local devices (camera and
>          microphone) (currently in scope for the Device APIs&  Policy
>          Working Group)
>          - API functions for encoding and other processing of those media
>          streams,
>          - API functions for establishing direct peer-to-peer
>          connections, including firewall/NAT traversal
>          - API functions for decoding and processing (including echo
>          cancelling, stream synchronization and a number of other
>          functions) of those streams at the incoming end,
>          - Delivery to the user of those media streams via local screens
>          and audio output devices (partially covered with HTML5)"
> So this is where I really start to feel the gap growing.  The DAP is
> pointing to RTC saying not sure how if our Camera/Microphone APIs are
> being superseded by the work in the RTC...and the RTC then points back
> to say it will be relying on work in the DAP.  However the RTCs
> Recommended Track Deliverables list does include:
>          "Media Stream Functions, Audio Stream Functions and Video Stream
>          Functions"
> So then it's back to the whatwg MediaStream and LocalMediaStream current
> work [8].  Following this through you end up back at the<audio>  and
> <video>  media element with some brief discussion about media data [9].
> Currently the only API that I'm aware of that allows live access to the
> audio data through the<audio>  tag is the relatively proprietary Mozilla
> Audio Data API [10].
> And while the video stream data can be accessed by rendering each frame
> into a canvas 2d graphics context and then using getImageData to extract
> and manipulate it from there [11], this seems more like a work around
> than an elegantly designed solution.
> As I said above, this is not intended as a criticism of the work that
> the DAP WG, WebRTC WG or WHATWG are doing.  It's intended as
> constructive feedback to highlight that the important use case of
> "Access to live/raw audio and video stream data from both local and
> remote sources" appears to be falling in the gaps between the groups.
>> From my perspective this is a critical use case for many advanced web
> apps that will help bring them in line with what's possible in the
> native single vendor stack based apps at the moment (e.g. iPhone&
> Android).  And it's also critical for the advancement of web standards
> based AR applications and other computer vision, hearing and signal
> processing applications.
> I understand that a lot of these specifications I've covered are in very
> formative stages and that requirements and PWDs are just being drafted
> as I write.  And that's exactly why I'm raising this as a single and
> consolidated perspective that spans all these groups.  I hope this goes
> some way towards "Access to live/raw audio and video stream data from
> both local and remote sources" being treated as an essential and core
> use case that binds together the work of all these groups.  With a clear
> vision for this and a little consolidated work I think this will then
> also open up a wide range of other app opportunities that we haven't
> even thought of yet.  But at the moment it really feels like this is
> being treated as an assumed requirement and could end up as a poorly
> formed second class bundle of semi-related API hooks.
> For this use case I'd really like these clear requirements to be
> supported:
> - access the raw stream data for both audio and video in similar ways
> - access the raw stream data from both remote and local streams in
> similar ways
> - ability to inject new data or the transformed original data back into
> streams and presented audio/video tags in a consistent way
> - all of this be optimised for performance to meet the demands of live
> signal processing
> roBman
> PS: I've also cc'd in the mozilla dev list as I think this directly
> relates to the current "booting to the web" thread [12]
> [1]
> [2]
> [3]
> [4]
> [5]
> [6]
> [7]
> [8]
> [9]
> [10]
> [11]
> [12]

Received on Wednesday, 24 August 2011 06:41:33 UTC