Re: Media Capture and Streams Last Call review; deadline May 15 ( LC-3013) from Stefan Håkansson LK on 2015-10-06 (public-media-capture@w3.org from October 2015)

From: Stefan Håkansson LK <stefan.lk.hakansson@ericsson.com>
Date: Tue, 6 Oct 2015 12:01:51 +0000
To: Janina Sajka <janina@rednote.net>, Dominique Hazael-Massieux <dom@w3.org>
CC: Nigel Megitt <nigel.megitt@bbc.co.uk>, "public-media-capture@w3.org" <public-media-capture@w3.org>, "public-pfwg@w3.org" <public-pfwg@w3.org>
Message-ID: <1447FA0C20ED5147A1AA0EF02890A64B37379FE9@ESESSMB209.ericsson.se>
Hi Janina,

thanks for your comments. My personal reflections below.

On 30/09/15 17:07, Janina Sajka wrote:
> Hi, Dom, Nigel:
>
> It has been my responsibility on behalf of PFWG to respond on this
> specification. I apologize for being so very late in responding.
>
> Overall, I believe PFWG can accept this API. However, I do have some
> concerns that might benefit from clarification. Details below.
>
> Before getting into my questions, I want to thank Nigel on behalf of PFWG for raising questions re text-based media
> content, especially as it might pertain to WebRTC.
>
> Dom, your response to Nigel makes sense to me. But, I think it
> illustrates a larger problem around the various specs being produced at
> W3C which relate to web-based media acquisition and delivery. We in PFWG
> have sometimes found it difficult to understand which spec fills which
> piece of the overall solution. Perhaps we need an document that names
> the various components and illustrates how they fit together? PFWG is
> considering suggesting this to the TAG. What do you think?
>
> Part of the problem is the word media itself. Let us recall that the
> newspaper on your stoop in the morning is considered "media." Clearly,
> it's not the kind of media that W3C will address, except perhaps in it's
> web rendition. Similarly, to those of us who have worked on the MAUR,
> and I think to the WebRTC folks, "media" includes textual components
> alongside audio and video components. Consequently, I think it's
> understandable that each of us may have in mind different meanings when
> we see the term "media." I think just such a difference in definition
> may be behind Nigel's questions, and that is indeed how I understand
> Dom's response. It also tells me not to expect specification related to
> texted media syncronization as understood in the MAUR to be discussed in
> this document.
>
> If this is correct, my first comment is:
>
> 1.)	The Abstract is inaccurate and misleading. Currently, it reads:
> "This document defines a set of JavaScript APIs that allow local media,
> including audio and video, to be requested from a platform."
>
> 	Again, if my above summation is correct, this Abstract should
> 	not claim "media," but rather "audio and video components of
> 	media." It is not "including" audio and video. It is exclusive
> 	to those two components.
>
> 		SUGGESTED SOLUTION: "This document defines a set of
> 		JavaScript APIs that allow local audio and video media
> 		to be requested from a platform."

I think a change along these lines makes sense. Note that we will add 
text describing how it can be extended (as has already happened for 
depth streams [2]), so the text should probably say that this spec is 
only about audio and video, but that it can be extended.

>
> 2.)	It is unclear to me from my reading of this specification
> whether you consider it sufficient to synchronize multiple video and
> audio sources as our Media Accessibility User Requirements (MAUR)[1]
> contemplates. Specifically, Sec. 3.8 "Sign Translation" and Sec 3.1
> "Described Video" require separate video and audio, respectively, from
> separate source files.

The spec has two basic concepts, MediaStream and MediaStreamTrack. A 
MediaStream is a way to group zero or more MediaStreamTracks, and 
quoting the spec "All tracks in a MediaStream are intended to be 
synchronized when rendered.", so in principle [1] 3.1 and 3.8 could be 
supported.

The problem lies with how to create those MediaStreams. The 
specification only describes how to do it from live camera and 
microphone sources, so "separate source files" are out of scope. OTOH, a 
camera and one microphone could capture an event (into two 
MediaStreamTracks, one video and one audio), while a commenter comments 
in a second microphone that generates a third MediaStreamTrack. If those 
three MediaStreamTracks are grouped in one MediaStream they are 
"intended to be synchronized when rendered.", so that would work, and a 
similar approach could be made with two cameras and one MediaStreamTrack 
for video of sign language.

However, to me the specification is already sufficiently clear on 
synchronization of content from live sources, I do not see a need to add 
specific examples for this.

When it comes to using "separate source files" and playing synchronized, 
this seems much more related to the html media elements to me. Here may 
be a relation to other specifications in development by the TF, namely 
MediaStream Recording [3] and Media Capture from DOM Elements [4].


>
> Is this specification deemed sufficient to support these use cases? If
> so, it would be very helpful to have examples that include a 3.8 and a
> 3.1 additional media source in the specification.
>
> Thank you for your consideration of these comments, and for your
> forbearance with my tardiness.

You are most welcome!

Stefan

>
> Janina Sajka, on behalf of Protocols & Formats WG
>
>
> [1] http://w3c.github.io/pfwg/media-accessibility-reqs/
[2] http://w3c.github.io/mediacapture-depth/
[3] http://w3c.github.io/mediacapture-record/
[4] http://w3c.github.io/mediacapture-fromelement/

>
>
> Dominique Hazael-Massieux writes:
>> Hi Nigel,
>>
>> On 10/07/2015 16:11, Nigel Megitt wrote:
>>>
>>> Thank you for your response to my comment. I agree that the current WD
>>> does not deal with streams of data related to the media, so in that sense
>>> you have provided an accurate answer. However I am far from certain that
>>> this is acceptable. As far as I can see this constraint prevents WebRTC
>>> both from being augmented with accessibility data for example
>>> subtitles/captions and from being augmented with other data-based
>>> functionality such as the display of text or graphics not associated with
>>> accessibility.
>>
>> The fact that this particular API doesn't provide the necessary hooks
>> doesn't imply it's not doable with WebRTC.
>>
>> Indeed, for something like subtitle and captioning, you can already re-use
>> the existing synchronization mechanisms provided by HTML media elements
>> (e.g. ontimeupdate events) to display text synchronously with the content
>> captured via getUserMedia.
>>
>> You could even use WebRTC data channels to transmit these captions if they
>> are sourced from the same browser as the video/audio are.
>>
>> But the specific API we're talking about (Media Capture and Streams) is not
>> specific to WebRTC; it strictly focuses on capturing media streams, and
>> formalizing their synchronization semantics, not how they can be then
>> transmitted or possibly synchronized with other out-of-band content.
>>
>>> I note that the Working Group Charter lists a dependency on WAI Protocols
>>> and Formats Working Group: "Reviews from the WAI PF Working Group will be
>>> required to ensure the APIs allow to create an accessible user
>>> experience."
>>
>> We've solicited feedback from the WAI PFWG both directly and via the HTML
>> Accessibility Task Force, but haven't heard back so far. I'm trying to get
>> information as to whether we should expect any.
>>
>>> I am not a member of WAI PFWG but have copied in
>>> public-pfwg@w3.org to this message to ensure they have visibility of my
>>> comment: at present I believe that the APIs do not "allow to create an
>>> accessible user experience."
>>
>> If you're talking specifically about synchronizing subtitles or captions, I
>> think the APIs, taken with the rest of the platform, do allow to create an
>> accessible user experience.
>>
>> If you're thinking of some other use cases, could you clarify which ones?
>>
>> If you don't think my assumptions about the possibility of using
>> synchronization events for captions/subtitles for an accessible user
>> experience hold, could you describe in more details why they're not
>> sufficient? This would go a long way toward understanding what we would need
>> to change in the API.
>>
>>> I would suggest it should be a matter of priority for the Working Group to
>>> consider adding this capability. You request a proposal for a specific
>>> solution for this. One possible solution would be to extend the
>>> MediaStreamTrack.kind attribute to permit the value "data" and to have a
>>> further more specific type so that user agents can process data tracks
>>> successfully.
>>
>> But why would they need to be put into a MediaStreamTrack object when
>> they're not media content? What benefit is there to try and them in that
>> structure instead of keeping that as out-of-band data?
>>
>>> It may also be helpful or necessary to expose a common clock
>>> with which such data may be synchronised - further design work to
>>> establish the importance of this would be needed.
>>
>> I believe that for captioning, the clock provided by ontimeupdate provides
>> sufficient accuracy; but again, I may be missing something here, so would
>> welcome your input as to why they would not.
>>
>>> An example of the usage scenario could be the provision of a sequence of
>>> TTML or WebVTT documents which, on presentation, provide
>>> subtitles/captions for the video or audio content. This could be achieved
>>> by having a MediaStreamTrack of kind "data" and subtype "ttml+xml" in the
>>> case of TTML.
>>
>> Clearly being able to play TTML or WebVTT documents along with playing a
>> video or audio obtained from a MediaStream is useful; but why would they
>> need to be provided in the same container as the media stream itself? as far
>> as I know, for other video sources, these documents are provided out of band
>> and synchronized by the client; this should apply with media streams
>> obtained from getUserMedia as well, without having to force them into a
>> MediaStream structure for which they're not fitted.
>>
>> Thanks for working with us on this! If it would be helpful to have a call to
>> make faster progress or discuss some ideas in more details, let me know!
>>
>> Dom
>>
>>
>
Received on Tuesday, 6 October 2015 12:02:24 UTC