Re: Media Capture and Streams Last Call review; deadline May 15 ( LC-3013) from Dominique Hazael-Massieux on 2015-07-21 (public-pfwg@w3.org from July 2015)

From: Dominique Hazael-Massieux <dom@w3.org>
Date: Tue, 21 Jul 2015 15:02:27 +0200
To: Nigel Megitt <nigel.megitt@bbc.co.uk>, "public-media-capture@w3.org" <public-media-capture@w3.org>, "public-pfwg@w3.org" <public-pfwg@w3.org>
Message-ID: <55AE42E3.3090702@w3.org>
Hi Nigel,

On 10/07/2015 16:11, Nigel Megitt wrote:
>
> Thank you for your response to my comment. I agree that the current WD
> does not deal with streams of data related to the media, so in that sense
> you have provided an accurate answer. However I am far from certain that
> this is acceptable. As far as I can see this constraint prevents WebRTC
> both from being augmented with accessibility data for example
> subtitles/captions and from being augmented with other data-based
> functionality such as the display of text or graphics not associated with
> accessibility.

The fact that this particular API doesn't provide the necessary hooks 
doesn't imply it's not doable with WebRTC.

Indeed, for something like subtitle and captioning, you can already 
re-use the existing synchronization mechanisms provided by HTML media 
elements (e.g. ontimeupdate events) to display text synchronously with 
the content captured via getUserMedia.

You could even use WebRTC data channels to transmit these captions if 
they are sourced from the same browser as the video/audio are.

But the specific API we're talking about (Media Capture and Streams) is 
not specific to WebRTC; it strictly focuses on capturing media streams, 
and formalizing their synchronization semantics, not how they can be 
then transmitted or possibly synchronized with other out-of-band content.

> I note that the Working Group Charter lists a dependency on WAI Protocols
> and Formats Working Group: "Reviews from the WAI PF Working Group will be
> required to ensure the APIs allow to create an accessible user
> experience."

We've solicited feedback from the WAI PFWG both directly and via the 
HTML Accessibility Task Force, but haven't heard back so far. I'm trying 
to get information as to whether we should expect any.

> I am not a member of WAI PFWG but have copied in
> public-pfwg@w3.org to this message to ensure they have visibility of my
> comment: at present I believe that the APIs do not "allow to create an
> accessible user experience."

If you're talking specifically about synchronizing subtitles or 
captions, I think the APIs, taken with the rest of the platform, do 
allow to create an accessible user experience.

If you're thinking of some other use cases, could you clarify which ones?

If you don't think my assumptions about the possibility of using 
synchronization events for captions/subtitles for an accessible user 
experience hold, could you describe in more details why they're not 
sufficient? This would go a long way toward understanding what we would 
need to change in the API.

> I would suggest it should be a matter of priority for the Working Group to
> consider adding this capability. You request a proposal for a specific
> solution for this. One possible solution would be to extend the
> MediaStreamTrack.kind attribute to permit the value "data" and to have a
> further more specific type so that user agents can process data tracks
> successfully.

But why would they need to be put into a MediaStreamTrack object when 
they're not media content? What benefit is there to try and them in that 
structure instead of keeping that as out-of-band data?

> It may also be helpful or necessary to expose a common clock
> with which such data may be synchronised - further design work to
> establish the importance of this would be needed.

I believe that for captioning, the clock provided by ontimeupdate 
provides sufficient accuracy; but again, I may be missing something 
here, so would welcome your input as to why they would not.

> An example of the usage scenario could be the provision of a sequence of
> TTML or WebVTT documents which, on presentation, provide
> subtitles/captions for the video or audio content. This could be achieved
> by having a MediaStreamTrack of kind "data" and subtype "ttml+xml" in the
> case of TTML.

Clearly being able to play TTML or WebVTT documents along with playing a 
video or audio obtained from a MediaStream is useful; but why would they 
need to be provided in the same container as the media stream itself? as 
far as I know, for other video sources, these documents are provided out 
of band and synchronized by the client; this should apply with media 
streams obtained from getUserMedia as well, without having to force them 
into a MediaStream structure for which they're not fitted.

Thanks for working with us on this! If it would be helpful to have a 
call to make faster progress or discuss some ideas in more details, let 
me know!

Dom
Received on Tuesday, 21 July 2015 13:02:56 UTC