Re: Media Capture and Streams Last Call review; deadline May 15 ( LC-3013)

Hi Dom,

Apologies for the long response time. Comments below:

On 17/08/2015 13:50, "Dominique Hazael-Massieux" <dom@w3.org> wrote:

>Hi Nigel,
>
>Any feedback on my reply below? I would like to close the loop on this
>if possible; at the very least, it would be useful to understand if you
>intend to formally object to our disposition of your comment or not so
>that we can understand what our next steps with progressing that
>specification will need to be.
>
>Thanks,
>
>Dom
>
>On 21/07/2015 15:02, Dominique Hazael-Massieux wrote:
>> Hi Nigel,
>>
>> On 10/07/2015 16:11, Nigel Megitt wrote:
>>>
>>> Thank you for your response to my comment. I agree that the current WD
>>> does not deal with streams of data related to the media, so in that
>>>sense
>>> you have provided an accurate answer. However I am far from certain
>>>that
>>> this is acceptable. As far as I can see this constraint prevents WebRTC
>>> both from being augmented with accessibility data for example
>>> subtitles/captions and from being augmented with other data-based
>>> functionality such as the display of text or graphics not associated
>>>with
>>> accessibility.
>>
>> The fact that this particular API doesn't provide the necessary hooks
>> doesn't imply it's not doable with WebRTC.
>>
>> Indeed, for something like subtitle and captioning, you can already
>> re-use the existing synchronization mechanisms provided by HTML media
>> elements (e.g. ontimeupdate events) to display text synchronously with
>> the content captured via getUserMedia.
>>
>> You could even use WebRTC data channels to transmit these captions if
>> they are sourced from the same browser as the video/audio are.

Right, hence my proposal to include the data kind in the enumeration of
stream types.

>>But the specific API we're talking about (Media Capture and Streams) is
>> not specific to WebRTC; it strictly focuses on capturing media streams,
>> and formalizing their synchronization semantics, not how they can be
>> then transmitted or possibly synchronized with other out-of-band
>>content.
>>
>>> I note that the Working Group Charter lists a dependency on WAI
>>>Protocols
>>> and Formats Working Group: "Reviews from the WAI PF Working Group will
>>>be
>>> required to ensure the APIs allow to create an accessible user
>>> experience."
>>
>> We've solicited feedback from the WAI PFWG both directly and via the
>> HTML Accessibility Task Force, but haven't heard back so far. I'm trying
>> to get information as to whether we should expect any.
>>
>>> I am not a member of WAI PFWG but have copied in
>>> public-pfwg@w3.org to this message to ensure they have visibility of my
>>> comment: at present I believe that the APIs do not "allow to create an
>>> accessible user experience."
>>
>> If you're talking specifically about synchronizing subtitles or
>> captions, I think the APIs, taken with the rest of the platform, do
>> allow to create an accessible user experience.
>>
>> If you're thinking of some other use cases, could you clarify which
>>ones?
>>
>> If you don't think my assumptions about the possibility of using
>> synchronization events for captions/subtitles for an accessible user
>> experience hold, could you describe in more details why they're not
>> sufficient? This would go a long way toward understanding what we would
>> need to change in the API.
>>
>>> I would suggest it should be a matter of priority for the Working
>>> Group to
>>> consider adding this capability. You request a proposal for a specific
>>> solution for this. One possible solution would be to extend the
>>> MediaStreamTrack.kind attribute to permit the value "data" and to have
>>>a
>>> further more specific type so that user agents can process data tracks
>>> successfully.
>>
>> But why would they need to be put into a MediaStreamTrack object when
>> they're not media content?

I don't agree with your premise: they are media content. I understand that
in the HTML5 spec media elements are only audio and video, but I would
make the distinction between a serialised stream of subtitles/captions and
the text track cues that may be generated for the UA to present them. It's
the stream that would typically need to be bundled.

>> What benefit is there to try and them in that
>> structure instead of keeping that as out-of-band data?

The requirement for storing them together and on playback maintaining
synchronisation exists equally for subtitles/captions, audio and video. I
don't see why you would make a distinction here.

>>>It may also be helpful or necessary to expose a common clock
>>> with which such data may be synchronised - further design work to
>>> establish the importance of this would be needed.
>>
>> I believe that for captioning, the clock provided by ontimeupdate
>> provides sufficient accuracy; but again, I may be missing something
>> here, so would welcome your input as to why they would not.

This isn't an accuracy issue per se. It's just that whatever media
timeline positions are exposed by e.g. ontimeupdate for an audio track
need to be on the same timeline as the subtitles/captions timeline so that
they can be aligned.I see that the media stream always begins at time
zero, so that's probably sufficient.

>>
>>> An example of the usage scenario could be the provision of a sequence
>>>of
>>> TTML or WebVTT documents which, on presentation, provide
>>> subtitles/captions for the video or audio content. This could be
>>>achieved
>>> by having a MediaStreamTrack of kind "data" and subtype "ttml+xml" in
>>>the
>>> case of TTML.
>>
>> Clearly being able to play TTML or WebVTT documents along with playing a
>> video or audio obtained from a MediaStream is useful; but why would they
>> need to be provided in the same container as the media stream itself?

The question is why should they be prevented from being provided in the
same container. It's normal practice to multiplex within a single
container all the components needed for playback of the packaged media;
subtitles/captions are one such component.

>>as
>> far as I know, for other video sources, these documents are provided out
>> of band and synchronized by the client; this should apply with media
>> streams obtained from getUserMedia as well, without having to force them
>> into a MediaStream structure for which they're not fitted.
>>
>> Thanks for working with us on this! If it would be helpful to have a
>> call to make faster progress or discuss some ideas in more details, let
>> me know!

I'm happy to join a call to go through the use cases I have alluded to, in
case the reason for my questions is my incomplete understanding and we can
resolve it directly. Email threads are infamously not conducive to that
kind of resolution!

Nigel


>>
>> Dom
>>
>>
>



-----------------------------
http://www.bbc.co.uk

This e-mail (and any attachments) is confidential and
may contain personal views which are not the views of the BBC unless specifically stated.
If you have received it in
error, please delete it from your system.
Do not use, copy or disclose the
information in any way nor act in reliance on it and notify the sender
immediately.
Please note that the BBC monitors e-mails
sent or received.
Further communication will signify your consent to
this.
-----------------------------

Received on Tuesday, 18 August 2015 10:38:20 UTC