RE: updates to requirements document from Travis Leithead on 2012-07-11 (public-media-capture@w3.org from July 2012)

From: Travis Leithead <travis.leithead@microsoft.com>
Date: Wed, 11 Jul 2012 16:55:30 +0000
To: Randell Jesup <randell-ietf@jesup.org>, "public-media-capture@w3.org" <public-media-capture@w3.org>
Message-ID: <9768D477C67135458BF978A45BCF9B38383C52D4@TK5EX14MBXW605.wingroup.windeploy.ntde>
Regardless of the challenge, specs like Media Source are moving in the direction of exposing encoded bit streams. I think the questions you pose below are good to discuss further and should not block the creation of the requirement that Milan proposed. To fulfill the use-cases, I posit that JavaScript should be able to get access to *something* while a capture is in-progress. We have the ability to figure out what that something should be.

From: Randell Jesup [mailto:randell-ietf@jesup.org]
Sent: Wednesday, July 11, 2012 8:42 AM
To: public-media-capture@w3.org
Subject: Re: updates to requirements document

On 7/11/2012 10:03 AM, Young, Milan wrote:
As a reminder, I’m proposing that we add the following requirement: “The UA must allow the Application to access both an encoded representation of the media and associated control information needed for decoding while capture is in progress.”

Any objections?


This begs the question: what encoding?  How is that specified?  Where is it encoded?  How is the encoder controlled?  Bit rate?  Congestion control if it goes over a wire?  Typically this goes into PeerConnection and is encoded in some manner, but the app doesn't have access to the bytestreams.  This proposal begs for the decomposition of codecs and encoding from PeerConnection, which would be a significant architectural change.  See also Harald's 'rant' about bytestreams from last year sometime on the webrtc w3 list (I think).

The "translation" case could be handled by either asking PeerConnection for a high-reliability connection (TCP, or FEC at the cost of bandwidth), or long re-transmit buffers and have the translation receiver use NACKs to repair errors.  This (if there was some way to get it encoded) would allow other methods of shipping the audio for translation (WebSockets for example).

And access to the encoded or decoded media itself is a potential security issue.  See the "MediaStream Security" presentation from the Mountain View W3C meeting at the interim in February.  That may or may not be relevant here; I haven't thought it through.  What's the trust model?  We have one for WebRTC.

And...  Defining the associated control information needed for decoding is a significant task, especially as it would need to be codec-agnostic.  (Which from the conversation I think you realize.)  This also is an API that I believe we at Mozilla (or some of us) disagree with (though I'm not the person primarily following this; I think Robert O'Callahan and Tim Terriberry are).




From: Young, Milan [mailto:Milan.Young@nuance.com]
Sent: Friday, July 06, 2012 2:02 PM
To: Travis Leithead; Jim Barnett; Sunyang (Eric); public-media-capture@w3.org<mailto:public-media-capture@w3.org>
Subject: RE: updates to requirements document

The Media Source spec is using the term “Byte Stream” [1] to denote the sequence of Initialization and Media Segments that you mention below.  (Essentially a container format around the raw media.)  But yes, we are thinking in the same direction and I agree that the exact content of that stream should remain implementation and task dependent.

Returning to the topic at hand, we need to define requirements so that this group can address the documented use cases.  At present, “capture audio for a translation site” is dangling.

To this end, we still need to add a new requirement that the capture process exposes the bit stream.  I suggest the following (adjustments since the last iteration in bold): “The UA must allow the Application to access both an encoded representation of the media and associated control information needed for decoding while capture is in progress.”

Thanks

[1] http://dvcs.w3.org/hg/html-media/raw-file/tip/media-source/media-source.html#byte-stream-formats



From: Travis Leithead [mailto:travis.leithead@microsoft.com]<mailto:[mailto:travis.leithead@microsoft.com]>
Sent: Friday, July 06, 2012 10:16 AM
To: Young, Milan; Jim Barnett; Sunyang (Eric); public-media-capture@w3.org<mailto:public-media-capture@w3.org>
Subject: RE: updates to requirements document

Sounds right to me to.

Off topic:
Based on my reading of MediaSource, in order to interop nicely with that API, direct-access to the capture stream (while capture is on-going) basically involves making two types of byte sequences available (as Uint8Array) initialization segments and media segments (http://dvcs.w3.org/hg/html-media/raw-file/tip/media-source/media-source.html#init-segment). I don’t even think the capture spec itself would need to define the details of those, possibly just identifiers that would allow JS to interpret what the underlying format is.

From: Young, Milan [mailto:Milan.Young@nuance.com]<mailto:[mailto:Milan.Young@nuance.com]>
Sent: Friday, July 6, 2012 8:03 AM
To: Jim Barnett; Sunyang (Eric); public-media-capture@w3.org<mailto:public-media-capture@w3.org>
Subject: RE: updates to requirements document

Thanks Jim.  That sounds right to me.


From: Jim Barnett [mailto:Jim.Barnett@genesyslab.com]<mailto:[mailto:Jim.Barnett@genesyslab.com]>
Sent: Friday, July 06, 2012 6:02 AM
To: Sunyang (Eric); Young, Milan; public-media-capture@w3.org<mailto:public-media-capture@w3.org>
Subject: RE: updates to requirements document

To summarize the discussion so far:  it sounds like we agree that the App  will sometimes need direct access to the capture stream, and at other times will want the capture streamed directly to a file or some other sink.  The App may also want  to combine the two (stream to a file while also directly accessing the capture.)   The main question is how much we need to define in our spec as opposed to pointing to other pre-existing specs.   Does that sound right to you, Eric and Milan?


-          Jim

From: Sunyang (Eric) [mailto:eric.sun@huawei.com]<mailto:[mailto:eric.sun@huawei.com]>
Sent: Friday, July 06, 2012 2:18 AM
To: Young, Milan; Jim Barnett; public-media-capture@w3.org<mailto:public-media-capture@w3.org>
Subject: 答复: updates to requirements document

For “the ability to view the media stream in its encoded form”, I think we’d better reference media source API
http://dvcs.w3.org/hg/html-media/raw-file/tip/media-source/media-source.html


The second paragraph you mentioned is more like the style of media source, welcome to html-media task force for discussion.

Yang
Huawei

发件人: Young, Milan [mailto:Milan.Young@nuance.com]
发送时间: 2012年7月6日 14:02
收件人: Sunyang (Eric); Jim Barnett; public-media-capture@w3.org<mailto:public-media-capture@w3.org>
主题: RE: updates to requirements document

I believe that there are several use cases that do not require exposing the bits of the media stream to the Application layer.  So I think it’s going too far to say that transport is always the burden of the Application.  I think a better way to phrase that is to say that the Application should always have the ability to view the media stream in its encoded form.

The tricky part is defining a canonical media form.  Will this be on sample intervals, fixed block size, logical compression boundaries, … ?  I don’t have a strong opinion, but I suspect fixed size blocks of data (ie N bytes at a time regardless of what those bytes represent) will be easiest to spec and most useful to the largest range of use cases.

Thanks

From: Sunyang (Eric) [mailto:eric.sun@huawei.com]
Sent: Thursday, July 05, 2012 7:50 PM
To: Jim Barnett; Young, Milan; public-media-capture@w3.org<mailto:public-media-capture@w3.org>
Subject: 答复: updates to requirements document

I wonder how clear the division should be.
I suggest we do not touch the transport/upload part of use cases, but we can remove all application responsibility which not relative with capture/permission from the requirement, I think this is feasible, and easy for improvement.

Yang
Huawei

发件人: Jim Barnett [mailto:Jim.Barnett@genesyslab.com]<mailto:[mailto:Jim.Barnett@genesyslab.com]>
发送时间: 2012年7月6日 8:41
收件人: Young, Milan; public-media-capture@w3.org<mailto:public-media-capture@w3.org>
主题: RE: updates to requirements document

That sounds reasonable to me.  I take you to be saying that transport/uploading are the Application’s responsibility, and that the only requirement on the UA is that it make the encoded representation available.  That gives a clear division of responsibilities.  Are there other opinions?  (My guess is that many of the requirements are worded incorrectly.)


-          Jim

From: Young, Milan [mailto:Milan.Young@nuance.com]<mailto:[mailto:Milan.Young@nuance.com]>
Sent: Thursday, July 05, 2012 7:37 PM
To: Jim Barnett; public-media-capture@w3.org<mailto:public-media-capture@w3.org>
Subject: RE: updates to requirements document

Hello Jim, thanks for putting this together.

The 1st requirement under REMOTE MEDIA currently states: “The UA must be able to transmit media to one or more remote sites and to receive media from them.”  My concern is that the language is insufficient to handle all of the scenarios put forward in the section titled “Capturing a media stream” under “Design Considerations and Remarks”.  These are:

1)      capture a video and upload to a video sharing site

2)      capture a picture for my user profile picture in a given web app

3)      capture audio for a translation site

4)      capture a video chat/conference

The first two transfer types would typically be handled as a bulk transfer after capture completes, which is a good fit for conventional transports like HTTP.  The fourth type is an obvious match to WebRTC.  The third type is a mix of the two.  The application prefers real time transmission, but is probably willing to sacrifice a few seconds of latency in the interest of reliable transport.  Something like an application-specific streaming protocol over WebSockets seems appropriate.

My request could be satisfied with the following new requirement: “The UA must allow the Application to access an encoded representation of the media while capture is in progress.”  Implicit in this request is that the UA will not always explicitly handle media transfer, but I think that could be inferred from the other requirements.

Does this sound reasonable?

Thanks


From: Jim Barnett [mailto:Jim.Barnett@genesyslab.com]
Sent: Tuesday, July 03, 2012 6:36 AM
To: public-media-capture@w3.org<mailto:public-media-capture@w3.org>
Subject: updates to requirements document

I have filled out the  requirements section in the use case document (http://dvcs.w3.org/hg/dap/raw-file/tip/media-stream-capture/scenarios.html)  and added links from the scenarios to the requirements. I have not modified any existing content or taken anything out of the document.

There’s still more work to do:

1) there are some free floating requirements that were suggested on the list but not incorporated in any of the scenarios.  Do we want to incorporate them into the scenarios or leave them as is?
2)  The scenarios contain lists of items that are similar to the requirements.  Do we want to remove them, or leave them in and modify them to match the requirements more closely?
3) I have organized the requirements into four classes: permissions, local media, remote media, and media capture.  Maybe it would  be better to have a different classification or a single list.

Let me know what you think.


-          Jim





--

Randell Jesup

randell-ietf@jesup.org<mailto:randell-ietf@jesup.org>
Received on Wednesday, 11 July 2012 16:56:24 UTC