- From: Rob Manson <roBman@mob-labs.com>
- Date: Wed, 27 Jul 2011 10:56:11 +1000
Hi, sorry for posting across multiple groups, but I hope you'll see from my comments below that this is really needed. This is definitely not intended as criticism of any of the work going on. It's intended as constructive feedback that hopefully provides clarification on a key use case and it's supporting requirements. "Access to live/raw audio and video stream data from both local and remote sources in a consistent way" I've spent quite a bit of time trying to follow a clear thread of requirements/solutions that provide API access to raw stream data (e.g. audio, video, etc.). But I'm a bit concerned this is falling in the gap between the DAP and RTC WGs. If this is not the case then please point me to the relevant docs and I'll happily get back in my box 8) Here's how the thread seems to flow at the moment based on public documents. On the DAP page [1] the mission states: "the Device APIs and Policy Working Group is to create client-side APIs that enable the development of Web Applications and Web Widgets that interact with devices services such as Calendar, Contacts, Camera, etc" So it seems clear that this is the place to start. Further down that page the "HTML Media Capture" and "Media Capture" APIs are listed. HTML Media Capture (camera/microphone interactions through HTML forms) initially seems like a good candidate, however the intro in the latest PWD [2] clearly states: "Providing streaming access to these capabilities is outside of the scope of this specification." Followed by a NOTE that states: "The Working Group is investigating the opportunity to specify streaming access via the proposed <device> element." The link on the "proposed <device> element" [3] links to a "no longer maintained" document that then redirects to the top level of the whatwg "current work" page [4]. On that page the most relevant link is the video conferencing and peer-to-peer communication section [5]. More about that further below. So back to the DAP page to follow explore the other Media Capture API (programmatic access to camera/microphone) [1] and it's latest PWD [6]. The abstract states: "This specification defines an Application Programming Interface (API) that provides access to the audio, image and video capture capabilities of the device." And the introduction states: "The Capture API defines a high-level interface for accessing the microphone and camera of a hosting device. It completes the HTML Form Based Media Capturing specification [HTMLMEDIACAPTURE] with a programmatic access to start a parametrized capture process." So it seems clear that this is not related to streams in any way either. The Notes column for this API on the DAP page [1] also states: "Programmatic API that completes the form based approach Need to check if still interest in this How does it relate with the Web RTC Working Group?" Is there an updated position on this? So if you then head over to the WebRTC WG's charter [7] it states: "...to define client-side APIs to enable Real-Time Communications in Web browsers. These APIs should enable building applications that can be run inside a browser, requiring no extra downloads or plugins, that allow communication between parties using audio, video and supplementary real-time communication, without having to use intervening servers..." So this is clearly focused upon peer-to-peer communication "between" systems and the stream related access is naturally just treated as an ancillary requirement. The scope section then states: "Enabling real-time communications between Web browsers require the following client-side technologies to be available: - API functions to explore device capabilities, e.g. camera, microphone, speakers (currently in scope for the Device APIs & Policy Working Group) - API functions to capture media from local devices (camera and microphone) (currently in scope for the Device APIs & Policy Working Group) - API functions for encoding and other processing of those media streams, - API functions for establishing direct peer-to-peer connections, including firewall/NAT traversal - API functions for decoding and processing (including echo cancelling, stream synchronization and a number of other functions) of those streams at the incoming end, - Delivery to the user of those media streams via local screens and audio output devices (partially covered with HTML5)" So this is where I really start to feel the gap growing. The DAP is pointing to RTC saying not sure how if our Camera/Microphone APIs are being superseded by the work in the RTC...and the RTC then points back to say it will be relying on work in the DAP. However the RTCs Recommended Track Deliverables list does include: "Media Stream Functions, Audio Stream Functions and Video Stream Functions" So then it's back to the whatwg MediaStream and LocalMediaStream current work [8]. Following this through you end up back at the <audio> and <video> media element with some brief discussion about media data [9]. Currently the only API that I'm aware of that allows live access to the audio data through the <audio> tag is the relatively proprietary Mozilla Audio Data API [10]. And while the video stream data can be accessed by rendering each frame into a canvas 2d graphics context and then using getImageData to extract and manipulate it from there [11], this seems more like a work around than an elegantly designed solution. As I said above, this is not intended as a criticism of the work that the DAP WG, WebRTC WG or WHATWG are doing. It's intended as constructive feedback to highlight that the important use case of "Access to live/raw audio and video stream data from both local and remote sources" appears to be falling in the gaps between the groups. >From my perspective this is a critical use case for many advanced web apps that will help bring them in line with what's possible in the native single vendor stack based apps at the moment (e.g. iPhone & Android). And it's also critical for the advancement of web standards based AR applications and other computer vision, hearing and signal processing applications. I understand that a lot of these specifications I've covered are in very formative stages and that requirements and PWDs are just being drafted as I write. And that's exactly why I'm raising this as a single and consolidated perspective that spans all these groups. I hope this goes some way towards "Access to live/raw audio and video stream data from both local and remote sources" being treated as an essential and core use case that binds together the work of all these groups. With a clear vision for this and a little consolidated work I think this will then also open up a wide range of other app opportunities that we haven't even thought of yet. But at the moment it really feels like this is being treated as an assumed requirement and could end up as a poorly formed second class bundle of semi-related API hooks. For this use case I'd really like these clear requirements to be supported: - access the raw stream data for both audio and video in similar ways - access the raw stream data from both remote and local streams in similar ways - ability to inject new data or the transformed original data back into streams and presented audio/video tags in a consistent way - all of this be optimised for performance to meet the demands of live signal processing roBman PS: I've also cc'd in the mozilla dev list as I think this directly relates to the current "booting to the web" thread [12] [1] http://www.w3.org/2009/dap/ [2] http://www.w3.org/TR/2011/WD-html-media-capture-20110414/#introduction [3] http://dev.w3.org/html5/html-device/ [4] http://www.whatwg.org/specs/web-apps/current-work/complete/#devices [5] http://www.whatwg.org/specs/web-apps/current-work/complete/#auto-toc-9 [6] http://www.w3.org/TR/2010/WD-media-capture-api-20100928/ [7] http://www.w3.org/2011/04/webrtc-charter.html [8] http://www.whatwg.org/specs/web-apps/current-work/complete/video-conferencing-and-peer-to-peer-communication.html#mediastream [9] http://www.whatwg.org/specs/web-apps/current-work/complete/the-iframe-element.html#media-data [10] https://wiki.mozilla.org/Audio_Data_API [11] https://developer.mozilla.org/En/Manipulating_video_using_canvas [12] http://groups.google.com/group/mozilla.dev.platform/browse_thread/thread/7668a9d46a43e482#
Received on Tuesday, 26 July 2011 17:56:11 UTC