- From: Rob Manson <roBman@mob-labs.com>
- Date: Fri, 05 Aug 2011 18:37:39 +1000
- To: Harald Alvestrand <harald@alvestrand.no>, "public-webrtc@w3.org" <public-webrtc@w3.org>
That's great Harald. Thanks. If you have any tips on a better format or forum for this type of submission please let me know. roBman On Fri, 2011-08-05 at 10:31 +0200, Harald Alvestrand wrote: > On 08/03/11 01:37, Rob Manson wrote: > > Hi Harald, Stefan and François, > > > > it's been a week since I sent a detailed email [1] listing a use case > > with related requirements for access to audio/video stream data. > Sorry, but since you crossposted this to the WHATWG list, my filters put > it into my WHATWG folder, which is actually so busy that I'm not able to > keep up - I only scan the subject lines for stuff that seems WebRTC > relevant. > Similar things may have happened to others - this is one of my worries > about the WHATWG process. > > I'll rearrange my filters so that messages to public-webrtc are picked > out first (and then find your message and move it and consider it). > > I understand you are all busy, but as the WEBRTC WG representatives > > could you please let me know if this has been reviewed in any way and if > > any issues or tasks have been raised based on this email? > > > > Or if this has been discarded could you please let me know that this is > > the case and what the reasons for that were. > > > > NOTE: The somewhat related thread "Clarification on media capture split > > between WebRTC and DAP" [2] also seems to be left unresolved. > > > > I've had a number of emails from people off-list in support of this use > > case so now I am even more convinced that this is a real issue that is > > worth further discussion. > > > > > > roBman > > > > > > [1] http://lists.w3.org/Archives/Public/public-webrtc/2011Jul/0170.html > > [2] http://lists.w3.org/Archives/Public/public-webrtc/2011Jul/0145.html > > > > > > > > On Wed, 2011-07-27 at 10:56 +1000, Rob Manson wrote: > >> Hi, > >> > >> sorry for posting across multiple groups, but I hope you'll see from my > >> comments below that this is really needed. > >> > >> This is definitely not intended as criticism of any of the work going > >> on. It's intended as constructive feedback that hopefully provides > >> clarification on a key use case and it's supporting requirements. > >> > >> "Access to live/raw audio and video stream data from both local > >> and remote sources in a consistent way" > >> > >> I've spent quite a bit of time trying to follow a clear thread of > >> requirements/solutions that provide API access to raw stream data (e.g. > >> audio, video, etc.). But I'm a bit concerned this is falling in the gap > >> between the DAP and RTC WGs. If this is not the case then please point > >> me to the relevant docs and I'll happily get back in my box 8) > >> > >> Here's how the thread seems to flow at the moment based on public > >> documents. > >> > >> On the DAP page [1] the mission states: > >> "the Device APIs and Policy Working Group is to create > >> client-side APIs that enable the development of Web Applications > >> and Web Widgets that interact with devices services such as > >> Calendar, Contacts, Camera, etc" > >> > >> So it seems clear that this is the place to start. Further down that > >> page the "HTML Media Capture" and "Media Capture" APIs are listed. > >> > >> HTML Media Capture (camera/microphone interactions through HTML forms) > >> initially seems like a good candidate, however the intro in the latest > >> PWD [2] clearly states: > >> "Providing streaming access to these capabilities is outside of > >> the scope of this specification." > >> > >> Followed by a NOTE that states: > >> "The Working Group is investigating the opportunity to specify > >> streaming access via the proposed<device> element." > >> > >> The link on the "proposed<device> element" [3] links to a "no longer > >> maintained" document that then redirects to the top level of the whatwg > >> "current work" page [4]. On that page the most relevant link is the > >> video conferencing and peer-to-peer communication section [5]. More > >> about that further below. > >> > >> So back to the DAP page to follow explore the other Media Capture API > >> (programmatic access to camera/microphone) [1] and it's latest PWD [6]. > >> The abstract states: > >> "This specification defines an Application Programming Interface > >> (API) that provides access to the audio, image and video capture > >> capabilities of the device." > >> > >> And the introduction states: > >> "The Capture API defines a high-level interface for accessing > >> the microphone and camera of a hosting device. It completes the > >> HTML Form Based Media Capturing specification [HTMLMEDIACAPTURE] > >> with a programmatic access to start a parametrized capture > >> process." > >> > >> So it seems clear that this is not related to streams in any way either. > >> > >> The Notes column for this API on the DAP page [1] also states: > >> "Programmatic API that completes the form based approach > >> Need to check if still interest in this > >> How does it relate with the Web RTC Working Group?" > >> > >> Is there an updated position on this? > >> > >> So if you then head over to the WebRTC WG's charter [7] it states: > >> "...to define client-side APIs to enable Real-Time > >> Communications in Web browsers. > >> > >> These APIs should enable building applications that can be run > >> inside a browser, requiring no extra downloads or plugins, that > >> allow communication between parties using audio, video and > >> supplementary real-time communication, without having to use > >> intervening servers..." > >> > >> So this is clearly focused upon peer-to-peer communication "between" > >> systems and the stream related access is naturally just treated as an > >> ancillary requirement. The scope section then states: > >> "Enabling real-time communications between Web browsers require > >> the following client-side technologies to be available: > >> > >> - API functions to explore device capabilities, e.g. camera, > >> microphone, speakers (currently in scope for the Device APIs& > >> Policy Working Group) > >> - API functions to capture media from local devices (camera and > >> microphone) (currently in scope for the Device APIs& Policy > >> Working Group) > >> - API functions for encoding and other processing of those media > >> streams, > >> - API functions for establishing direct peer-to-peer > >> connections, including firewall/NAT traversal > >> - API functions for decoding and processing (including echo > >> cancelling, stream synchronization and a number of other > >> functions) of those streams at the incoming end, > >> - Delivery to the user of those media streams via local screens > >> and audio output devices (partially covered with HTML5)" > >> > >> So this is where I really start to feel the gap growing. The DAP is > >> pointing to RTC saying not sure how if our Camera/Microphone APIs are > >> being superseded by the work in the RTC...and the RTC then points back > >> to say it will be relying on work in the DAP. However the RTCs > >> Recommended Track Deliverables list does include: > >> "Media Stream Functions, Audio Stream Functions and Video Stream > >> Functions" > >> > >> So then it's back to the whatwg MediaStream and LocalMediaStream current > >> work [8]. Following this through you end up back at the<audio> and > >> <video> media element with some brief discussion about media data [9]. > >> > >> Currently the only API that I'm aware of that allows live access to the > >> audio data through the<audio> tag is the relatively proprietary Mozilla > >> Audio Data API [10]. > >> > >> And while the video stream data can be accessed by rendering each frame > >> into a canvas 2d graphics context and then using getImageData to extract > >> and manipulate it from there [11], this seems more like a work around > >> than an elegantly designed solution. > >> > >> As I said above, this is not intended as a criticism of the work that > >> the DAP WG, WebRTC WG or WHATWG are doing. It's intended as > >> constructive feedback to highlight that the important use case of > >> "Access to live/raw audio and video stream data from both local and > >> remote sources" appears to be falling in the gaps between the groups. > >> > >> > From my perspective this is a critical use case for many advanced web > >> apps that will help bring them in line with what's possible in the > >> native single vendor stack based apps at the moment (e.g. iPhone& > >> Android). And it's also critical for the advancement of web standards > >> based AR applications and other computer vision, hearing and signal > >> processing applications. > >> > >> I understand that a lot of these specifications I've covered are in very > >> formative stages and that requirements and PWDs are just being drafted > >> as I write. And that's exactly why I'm raising this as a single and > >> consolidated perspective that spans all these groups. I hope this goes > >> some way towards "Access to live/raw audio and video stream data from > >> both local and remote sources" being treated as an essential and core > >> use case that binds together the work of all these groups. With a clear > >> vision for this and a little consolidated work I think this will then > >> also open up a wide range of other app opportunities that we haven't > >> even thought of yet. But at the moment it really feels like this is > >> being treated as an assumed requirement and could end up as a poorly > >> formed second class bundle of semi-related API hooks. > >> > >> For this use case I'd really like these clear requirements to be > >> supported: > >> - access the raw stream data for both audio and video in similar ways > >> - access the raw stream data from both remote and local streams in > >> similar ways > >> - ability to inject new data or the transformed original data back into > >> streams and presented audio/video tags in a consistent way > >> - all of this be optimised for performance to meet the demands of live > >> signal processing > >> > >> roBman > >> > >> PS: I've also cc'd in the mozilla dev list as I think this directly > >> relates to the current "booting to the web" thread [12] > >> > >> > >> [1] http://www.w3.org/2009/dap/ > >> [2] http://www.w3.org/TR/2011/WD-html-media-capture-20110414/#introduction > >> [3] http://dev.w3.org/html5/html-device/ > >> [4] http://www.whatwg.org/specs/web-apps/current-work/complete/#devices > >> [5] http://www.whatwg.org/specs/web-apps/current-work/complete/#auto-toc-9 > >> [6] http://www.w3.org/TR/2010/WD-media-capture-api-20100928/ > >> [7] http://www.w3.org/2011/04/webrtc-charter.html > >> [8] http://www.whatwg.org/specs/web-apps/current-work/complete/video-conferencing-and-peer-to-peer-communication.html#mediastream > >> [9] http://www.whatwg.org/specs/web-apps/current-work/complete/the-iframe-element.html#media-data > >> [10] https://wiki.mozilla.org/Audio_Data_API > >> [11] https://developer.mozilla.org/En/Manipulating_video_using_canvas > >> [12] http://groups.google.com/group/mozilla.dev.platform/browse_thread/thread/7668a9d46a43e482# > >> > >> > >> > > > > > > >
Received on Friday, 5 August 2011 08:38:18 UTC