Re: updates to requirements document from Timothy B. Terriberry on 2012-07-11 (public-media-capture@w3.org from July 2012)

From: Timothy B. Terriberry <tterriberry@mozilla.com>
Date: Wed, 11 Jul 2012 10:29:45 -0700
To: "public-media-capture@w3.org" <public-media-capture@w3.org>
Message-ID: <4FFDB809.3050106@mozilla.com>

Young, Milan wrote:
> I believe this newly proposed requirement **is** tied to existing 
> material in the spec.  Section 5.10 reads:
> 
> “Local media stream captures are common in a variety of sharing 
> scenarios such as:
> 
> capture a video and upload to a video sharing site
> 
> capture a picture for my user profile picture in a given web app
> 
> capture audio for a translation site
> 
> capture a video chat/conference”
> 
> I’d argue that perhaps the first two and definitely the third scenario 
> require the application layer to have access to the media.

1) What you really want is not ex post facto access to the encoded form
of data from a camera, but a general method of encoding a stream. As
soon as you want to do any processing on the client side (even as simple
as cropping, scaling, etc.) you're going to want to re-encode before
uploading. At that point, I have no idea what this requirement has to do
with capture. It applies equally to a MediaStream from any source.

In practice in WebRTC , the encoding actually happens right before the
data goes to the network, and the process is intimately tied to the
real-time nature of RTP and the constraints of the network. An "encoded
representation of the media" doesn't exist before that point. You could
satisfy this use-case in some (non-ideal) form today by doing what
Randell suggests (using WebRTC and capturing the RTP stream, a la
SIPREC). That at least wouldn't require any additional spec work.

2) For the image capture case, you almost certainly don't want an
encoded video stream, you want to encode an image. There's already a way
to do this (via the Canvas APIs).

3) For translation (which implies speech recognition), a) if you're
doing this on the client-side, you want access to the _uncompressed_
media, not the compressed form. Every re-compression step only makes
your job harder, and b) if you're doing this on the server side, then
latency becomes very important, and the RTP recording suggested in step
1 is actually what you want, not some offline storage format.

4) Again, if you want to record this on the server, you want access to
the RTP (preferably at the conference mixer, assuming there is one). No
need for a browser API for that case. If you want to record it on the
client, you want the general encoding API outlined in 1), but again this
has nothing to do with media capture (as in camera/microphone access).

>From the scenarios outlined above, I'm still looking for where the
MediaSource API (which "extends HTMLMediaElement to allow JavaScript to
generate media streams for playback") becomes at all relevant. Please
clue me in.

Received on Wednesday, 11 July 2012 17:30:18 UTC