Re: Access to live/raw audio and video stream data from both local and remote sources from Rob Manson on 2011-08-05 (public-webrtc@w3.org from August 2011)

From: Rob Manson <roBman@mob-labs.com>
Date: Fri, 05 Aug 2011 18:37:39 +1000
To: Harald Alvestrand <harald@alvestrand.no>, "public-webrtc@w3.org" <public-webrtc@w3.org>
Message-ID: <1312533459.3990.169212.camel@robslapu>
That's great Harald.

Thanks.

If you have any tips on a better format or forum for this type of
submission please let me know.


roBman


On Fri, 2011-08-05 at 10:31 +0200, Harald Alvestrand wrote:
> On 08/03/11 01:37, Rob Manson wrote:
> > Hi Harald, Stefan and François,
> >
> > it's been a week since I sent a detailed email [1] listing a use case
> > with related requirements for access to audio/video stream data.
> Sorry, but since you crossposted this to the WHATWG list, my filters put 
> it into my WHATWG folder, which is actually so busy that I'm not able to 
> keep up - I only scan the subject lines for stuff that seems WebRTC 
> relevant.
> Similar things may have happened to others - this is one of my worries 
> about the WHATWG process.
> 
> I'll rearrange my filters so that messages to public-webrtc are picked 
> out first (and then find your message and move it and consider it).
> > I understand you are all busy, but as the WEBRTC WG representatives
> > could you please let me know if this has been reviewed in any way and if
> > any issues or tasks have been raised based on this email?
> >
> > Or if this has been discarded could you please let me know that this is
> > the case and what the reasons for that were.
> >
> > NOTE: The somewhat related thread "Clarification on media capture split
> > between WebRTC and DAP" [2] also seems to be left unresolved.
> >
> > I've had a number of emails from people off-list in support of this use
> > case so now I am even more convinced that this is a real issue that is
> > worth further discussion.
> >
> >
> > roBman
> >
> >
> > [1] http://lists.w3.org/Archives/Public/public-webrtc/2011Jul/0170.html
> > [2] http://lists.w3.org/Archives/Public/public-webrtc/2011Jul/0145.html
> >
> >
> >
> > On Wed, 2011-07-27 at 10:56 +1000, Rob Manson wrote:
> >> Hi,
> >>
> >> sorry for posting across multiple groups, but I hope you'll see from my
> >> comments below that this is really needed.
> >>
> >> This is definitely not intended as criticism of any of the work going
> >> on.  It's intended as constructive feedback that hopefully provides
> >> clarification on a key use case and it's supporting requirements.
> >>
> >>          "Access to live/raw audio and video stream data from both local
> >>          and remote sources in a consistent way"
> >>
> >> I've spent quite a bit of time trying to follow a clear thread of
> >> requirements/solutions that provide API access to raw stream data (e.g.
> >> audio, video, etc.).  But I'm a bit concerned this is falling in the gap
> >> between the DAP and RTC WGs.  If this is not the case then please point
> >> me to the relevant docs and I'll happily get back in my box 8)
> >>
> >> Here's how the thread seems to flow at the moment based on public
> >> documents.
> >>
> >> On the DAP page [1] the mission states:
> >>          "the Device APIs and Policy Working Group is to create
> >>          client-side APIs that enable the development of Web Applications
> >>          and Web Widgets that interact with devices services such as
> >>          Calendar, Contacts, Camera, etc"
> >>
> >> So it seems clear that this is the place to start.  Further down that
> >> page the "HTML Media Capture" and "Media Capture" APIs are listed.
> >>
> >> HTML Media Capture (camera/microphone interactions through HTML forms)
> >> initially seems like a good candidate, however the intro in the latest
> >> PWD [2] clearly states:
> >>          "Providing streaming access to these capabilities is outside of
> >>          the scope of this specification."
> >>
> >> Followed by a NOTE that states:
> >>          "The Working Group is investigating the opportunity to specify
> >>          streaming access via the proposed<device>  element."
> >>
> >> The link on the "proposed<device>  element" [3] links to a "no longer
> >> maintained" document that then redirects to the top level of the whatwg
> >> "current work" page [4].  On that page the most relevant link is the
> >> video conferencing and peer-to-peer communication section [5].  More
> >> about that further below.
> >>
> >> So back to the DAP page to follow explore the other Media Capture API
> >> (programmatic access to camera/microphone) [1] and it's latest PWD [6].
> >> The abstract states:
> >>          "This specification defines an Application Programming Interface
> >>          (API) that provides access to the audio, image and video capture
> >>          capabilities of the device."
> >>
> >> And the introduction states:
> >>          "The Capture API defines a high-level interface for accessing
> >>          the microphone and camera of a hosting device. It completes the
> >>          HTML Form Based Media Capturing specification [HTMLMEDIACAPTURE]
> >>          with a programmatic access to start a parametrized capture
> >>          process."
> >>
> >> So it seems clear that this is not related to streams in any way either.
> >>
> >> The Notes column for this API on the DAP page [1] also states:
> >>          "Programmatic API that completes the form based approach
> >>          Need to check if still interest in this
> >>          How does it relate with the Web RTC Working Group?"
> >>
> >> Is there an updated position on this?
> >>
> >> So if you then head over to the WebRTC WG's charter [7] it states:
> >>          "...to define client-side APIs to enable Real-Time
> >>          Communications in Web browsers.
> >>
> >>          These APIs should enable building applications that can be run
> >>          inside a browser, requiring no extra downloads or plugins, that
> >>          allow communication between parties using audio, video and
> >>          supplementary real-time communication, without having to use
> >>          intervening servers..."
> >>
> >> So this is clearly focused upon peer-to-peer communication "between"
> >> systems and the stream related access is naturally just treated as an
> >> ancillary requirement.  The scope section then states:
> >>          "Enabling real-time communications between Web browsers require
> >>          the following client-side technologies to be available:
> >>
> >>          - API functions to explore device capabilities, e.g. camera,
> >>          microphone, speakers (currently in scope for the Device APIs&
> >>          Policy Working Group)
> >>          - API functions to capture media from local devices (camera and
> >>          microphone) (currently in scope for the Device APIs&  Policy
> >>          Working Group)
> >>          - API functions for encoding and other processing of those media
> >>          streams,
> >>          - API functions for establishing direct peer-to-peer
> >>          connections, including firewall/NAT traversal
> >>          - API functions for decoding and processing (including echo
> >>          cancelling, stream synchronization and a number of other
> >>          functions) of those streams at the incoming end,
> >>          - Delivery to the user of those media streams via local screens
> >>          and audio output devices (partially covered with HTML5)"
> >>
> >> So this is where I really start to feel the gap growing.  The DAP is
> >> pointing to RTC saying not sure how if our Camera/Microphone APIs are
> >> being superseded by the work in the RTC...and the RTC then points back
> >> to say it will be relying on work in the DAP.  However the RTCs
> >> Recommended Track Deliverables list does include:
> >>          "Media Stream Functions, Audio Stream Functions and Video Stream
> >>          Functions"
> >>
> >> So then it's back to the whatwg MediaStream and LocalMediaStream current
> >> work [8].  Following this through you end up back at the<audio>  and
> >> <video>  media element with some brief discussion about media data [9].
> >>
> >> Currently the only API that I'm aware of that allows live access to the
> >> audio data through the<audio>  tag is the relatively proprietary Mozilla
> >> Audio Data API [10].
> >>
> >> And while the video stream data can be accessed by rendering each frame
> >> into a canvas 2d graphics context and then using getImageData to extract
> >> and manipulate it from there [11], this seems more like a work around
> >> than an elegantly designed solution.
> >>
> >> As I said above, this is not intended as a criticism of the work that
> >> the DAP WG, WebRTC WG or WHATWG are doing.  It's intended as
> >> constructive feedback to highlight that the important use case of
> >> "Access to live/raw audio and video stream data from both local and
> >> remote sources" appears to be falling in the gaps between the groups.
> >>
> >> > From my perspective this is a critical use case for many advanced web
> >> apps that will help bring them in line with what's possible in the
> >> native single vendor stack based apps at the moment (e.g. iPhone&
> >> Android).  And it's also critical for the advancement of web standards
> >> based AR applications and other computer vision, hearing and signal
> >> processing applications.
> >>
> >> I understand that a lot of these specifications I've covered are in very
> >> formative stages and that requirements and PWDs are just being drafted
> >> as I write.  And that's exactly why I'm raising this as a single and
> >> consolidated perspective that spans all these groups.  I hope this goes
> >> some way towards "Access to live/raw audio and video stream data from
> >> both local and remote sources" being treated as an essential and core
> >> use case that binds together the work of all these groups.  With a clear
> >> vision for this and a little consolidated work I think this will then
> >> also open up a wide range of other app opportunities that we haven't
> >> even thought of yet.  But at the moment it really feels like this is
> >> being treated as an assumed requirement and could end up as a poorly
> >> formed second class bundle of semi-related API hooks.
> >>
> >> For this use case I'd really like these clear requirements to be
> >> supported:
> >> - access the raw stream data for both audio and video in similar ways
> >> - access the raw stream data from both remote and local streams in
> >> similar ways
> >> - ability to inject new data or the transformed original data back into
> >> streams and presented audio/video tags in a consistent way
> >> - all of this be optimised for performance to meet the demands of live
> >> signal processing
> >>
> >> roBman
> >>
> >> PS: I've also cc'd in the mozilla dev list as I think this directly
> >> relates to the current "booting to the web" thread [12]
> >>
> >>
> >> [1] http://www.w3.org/2009/dap/
> >> [2] http://www.w3.org/TR/2011/WD-html-media-capture-20110414/#introduction
> >> [3] http://dev.w3.org/html5/html-device/
> >> [4] http://www.whatwg.org/specs/web-apps/current-work/complete/#devices
> >> [5] http://www.whatwg.org/specs/web-apps/current-work/complete/#auto-toc-9
> >> [6] http://www.w3.org/TR/2010/WD-media-capture-api-20100928/
> >> [7] http://www.w3.org/2011/04/webrtc-charter.html
> >> [8] http://www.whatwg.org/specs/web-apps/current-work/complete/video-conferencing-and-peer-to-peer-communication.html#mediastream
> >> [9] http://www.whatwg.org/specs/web-apps/current-work/complete/the-iframe-element.html#media-data
> >> [10] https://wiki.mozilla.org/Audio_Data_API
> >> [11] https://developer.mozilla.org/En/Manipulating_video_using_canvas
> >> [12] http://groups.google.com/group/mozilla.dev.platform/browse_thread/thread/7668a9d46a43e482#
> >>
> >>
> >>
> >
> >
> 
> 
>
Received on Friday, 5 August 2011 08:38:18 UTC