Re: Media Capture and Streams Last Call review; deadline May 15 ( LC-3013) from Janina Sajka on 2015-09-30 (public-media-capture@w3.org from September 2015)

From: Janina Sajka <janina@rednote.net>
Date: Wed, 30 Sep 2015 11:06:33 -0400
To: Dominique Hazael-Massieux <dom@w3.org>
Cc: Nigel Megitt <nigel.megitt@bbc.co.uk>, "public-media-capture@w3.org" <public-media-capture@w3.org>, "public-pfwg@w3.org" <public-pfwg@w3.org>
Message-ID: <20150930150633.GJ1737@opera.rednote.net>
Hi, Dom, Nigel:

It has been my responsibility on behalf of PFWG to respond on this
specification. I apologize for being so very late in responding.

Overall, I believe PFWG can accept this API. However, I do have some
concerns that might benefit from clarification. Details below.

Before getting into my questions, I want to thank Nigel on behalf of PFWG for raising questions re text-based media
content, especially as it might pertain to WebRTC.

Dom, your response to Nigel makes sense to me. But, I think it
illustrates a larger problem around the various specs being produced at
W3C which relate to web-based media acquisition and delivery. We in PFWG
have sometimes found it difficult to understand which spec fills which
piece of the overall solution. Perhaps we need an document that names
the various components and illustrates how they fit together? PFWG is
considering suggesting this to the TAG. What do you think?

Part of the problem is the word media itself. Let us recall that the
newspaper on your stoop in the morning is considered "media." Clearly,
it's not the kind of media that W3C will address, except perhaps in it's
web rendition. Similarly, to those of us who have worked on the MAUR,
and I think to the WebRTC folks, "media" includes textual components
alongside audio and video components. Consequently, I think it's
understandable that each of us may have in mind different meanings when
we see the term "media." I think just such a difference in definition
may be behind Nigel's questions, and that is indeed how I understand
Dom's response. It also tells me not to expect specification related to
texted media syncronization as understood in the MAUR to be discussed in
this document.

If this is correct, my first comment is:

1.)	The Abstract is inaccurate and misleading. Currently, it reads:
"This document defines a set of JavaScript APIs that allow local media,
including audio and video, to be requested from a platform."

	Again, if my above summation is correct, this Abstract should
	not claim "media," but rather "audio and video components of
	media." It is not "including" audio and video. It is exclusive
	to those two components.

		SUGGESTED SOLUTION: "This document defines a set of
		JavaScript APIs that allow local audio and video media
		to be requested from a platform."

2.)	It is unclear to me from my reading of this specification
whether you consider it sufficient to synchronize multiple video and
audio sources as our Media Accessibility User Requirements (MAUR)[1]
contemplates. Specifically, Sec. 3.8 "Sign Translation" and Sec 3.1
"Described Video" require separate video and audio, respectively, from
separate source files.

Is this specification deemed sufficient to support these use cases? If
so, it would be very helpful to have examples that include a 3.8 and a
3.1 additional media source in the specification.

Thank you for your consideration of these comments, and for your
forbearance with my tardiness.

Janina Sajka, on behalf of Protocols & Formats WG


[1] http://w3c.github.io/pfwg/media-accessibility-reqs/


Dominique Hazael-Massieux writes:
> Hi Nigel,
> 
> On 10/07/2015 16:11, Nigel Megitt wrote:
> >
> >Thank you for your response to my comment. I agree that the current WD
> >does not deal with streams of data related to the media, so in that sense
> >you have provided an accurate answer. However I am far from certain that
> >this is acceptable. As far as I can see this constraint prevents WebRTC
> >both from being augmented with accessibility data for example
> >subtitles/captions and from being augmented with other data-based
> >functionality such as the display of text or graphics not associated with
> >accessibility.
> 
> The fact that this particular API doesn't provide the necessary hooks
> doesn't imply it's not doable with WebRTC.
> 
> Indeed, for something like subtitle and captioning, you can already re-use
> the existing synchronization mechanisms provided by HTML media elements
> (e.g. ontimeupdate events) to display text synchronously with the content
> captured via getUserMedia.
> 
> You could even use WebRTC data channels to transmit these captions if they
> are sourced from the same browser as the video/audio are.
> 
> But the specific API we're talking about (Media Capture and Streams) is not
> specific to WebRTC; it strictly focuses on capturing media streams, and
> formalizing their synchronization semantics, not how they can be then
> transmitted or possibly synchronized with other out-of-band content.
> 
> >I note that the Working Group Charter lists a dependency on WAI Protocols
> >and Formats Working Group: "Reviews from the WAI PF Working Group will be
> >required to ensure the APIs allow to create an accessible user
> >experience."
> 
> We've solicited feedback from the WAI PFWG both directly and via the HTML
> Accessibility Task Force, but haven't heard back so far. I'm trying to get
> information as to whether we should expect any.
> 
> >I am not a member of WAI PFWG but have copied in
> >public-pfwg@w3.org to this message to ensure they have visibility of my
> >comment: at present I believe that the APIs do not "allow to create an
> >accessible user experience."
> 
> If you're talking specifically about synchronizing subtitles or captions, I
> think the APIs, taken with the rest of the platform, do allow to create an
> accessible user experience.
> 
> If you're thinking of some other use cases, could you clarify which ones?
> 
> If you don't think my assumptions about the possibility of using
> synchronization events for captions/subtitles for an accessible user
> experience hold, could you describe in more details why they're not
> sufficient? This would go a long way toward understanding what we would need
> to change in the API.
> 
> >I would suggest it should be a matter of priority for the Working Group to
> >consider adding this capability. You request a proposal for a specific
> >solution for this. One possible solution would be to extend the
> >MediaStreamTrack.kind attribute to permit the value "data" and to have a
> >further more specific type so that user agents can process data tracks
> >successfully.
> 
> But why would they need to be put into a MediaStreamTrack object when
> they're not media content? What benefit is there to try and them in that
> structure instead of keeping that as out-of-band data?
> 
> >It may also be helpful or necessary to expose a common clock
> >with which such data may be synchronised - further design work to
> >establish the importance of this would be needed.
> 
> I believe that for captioning, the clock provided by ontimeupdate provides
> sufficient accuracy; but again, I may be missing something here, so would
> welcome your input as to why they would not.
> 
> >An example of the usage scenario could be the provision of a sequence of
> >TTML or WebVTT documents which, on presentation, provide
> >subtitles/captions for the video or audio content. This could be achieved
> >by having a MediaStreamTrack of kind "data" and subtype "ttml+xml" in the
> >case of TTML.
> 
> Clearly being able to play TTML or WebVTT documents along with playing a
> video or audio obtained from a MediaStream is useful; but why would they
> need to be provided in the same container as the media stream itself? as far
> as I know, for other video sources, these documents are provided out of band
> and synchronized by the client; this should apply with media streams
> obtained from getUserMedia as well, without having to force them into a
> MediaStream structure for which they're not fitted.
> 
> Thanks for working with us on this! If it would be helpful to have a call to
> make faster progress or discuss some ideas in more details, let me know!
> 
> Dom
> 
> 

-- 

Janina Sajka,	Phone:	+1.443.300.2200
			sip:janina@asterisk.rednote.net
		Email:	janina@rednote.net

Linux Foundation Fellow
Executive Chair, Accessibility Workgroup:	http://a11y.org

The World Wide Web Consortium (W3C), Web Accessibility Initiative (WAI)
Chair,	Protocols & Formats	http://www.w3.org/wai/pf
Received on Wednesday, 30 September 2015 15:07:03 UTC