Re: Action-219: Draft Response to MSE on Bug 23661 from Aaron Colwell on 2013-12-16 (public-html-media@w3.org from December 2013)

From: Aaron Colwell <acolwell@google.com>
Date: Mon, 16 Dec 2013 13:22:56 -0800
To: Charles McCathie Nevile <chaals@yandex-team.ru>
Cc: "public-html-media@w3.org" <public-html-media@w3.org>
Message-ID: <CAA0c1bBknjqUc__nq0ERAHOkwiDeNcWNybVdpcRZ_3d4UfaH6w@mail.gmail.com>
Comments inline..

On Mon, Dec 16, 2013 at 7:42 AM, Charles McCathie Nevile <chaals@yandex
-team.ru> wrote:

> On Thu, 12 Dec 2013 20:25:42 +0100, Aaron Colwell <acolwell@google.com>
> wrote:
>
> For those who don't know me, I am chaals, co-coordinator of the
> Accessibility TF inter alia. Except when giving the results of a formal
> Call for Consensus I cannot speak on behalf of the TF, but I believe my
> views are at least reasonably representative as a rough guide to what the
> Task Force is likely to agree or not.
>

I appreciate your input.


>
>  Comments inline..
>>
>
> Likewise, but let me preface my remarks by explaining that the point of
> this request is to find the least painful point of consensus in the
> spectrum between insisting that all implementations meet all accessibility
> requirements in order to conform, and changing W3C policy to acknowledge
> its specifications do not actually provide for accessibility as a core
> value.
>
>
I too would like to find a good balance. I believe accessibility is
important, but I want to make sure that the text we add to specs actually
results in making things more accessible and not just be lip service. I
could just agree right now and blindly add in the text but it wouldn't
necessarily result in actual accessibility improvements in implementations
nor clarify things for implementers that want to "do the right thing."


>
>  On Thu, Dec 12, 2013 at 10:28 AM, Paul Cotton <Paul.Cotton@microsoft.com>
>> wrote:
>>
>
>  See the extract from the A11Y TF IRC log below in which I made some of
>>> the points in your response during the A11Y TF discussion this
>>> morning:
>>>
>>> http://www.w3.org/2013/12/12-html-a11y-irc
>>>
>>
>> Thanks. I appreciate this.
>>
>>  Does HTML5 have a similar note?
>>>>
>>>
>>> The TF plans to open a bug on HTML5 to cause this to happen.
>>>
>>
>> Ok. That seems like the proper path forward to me.
>>
>
> If you look at the log, you will further note that the reason for raising
> this against MSE first is that MSE is likely to ship well before HTML.


So I feel like there are 2 parts to this.
First, if this type of accessibility is a true core value of the W3C it
seems like HTML should not be able to ship w/o this. Based on the
accessibility discussions I've observed during the ~2 years I've been
participating in the W3C, I know this is a contentious topic I don't really
want to rile people up again.

Second, sign-language video tracks support has not been specified anywhere
to my knowledge so it is unclear what requirements this actually places on
MSE. I understand that politics and the desire to ship likely prevents this
from being added to the HTML5 train, but if anything this should be placed
in an extension spec so that other specs like MSE can evaluate how to
properly integrate with this functionality. It is not clear to me that a
simple note saying that more than 1 video track needs to be supported to
handle sign-language tracks is enough. At a minimum you'd need to specify
how multiple video tracks being selected at one time should work since the
current HTML5 text doesn't even allow it. That sort of information is
required to properly update the algorithms in the MSE spec to support this
use case. There are likely many other details that would need to be ironed
out before it is clear how to properly enable support for this in MSE.


>
>
>  I object to adding this note to the MSE spec. This is an attempt to give
>>>> weight to an accessiblity issue that should be solved by the spec that
>>>> defines HTMLMediaElement behavior (ie HTML5 & HTML.next) and
>>>> not an extension spec that is simply providing an alternate way to
>>>> supply media to the HTMLMediaElement.
>>>>
>>>
> Enabling a superior experience for users is a laudable goal. Indeed, it is
> also at the core of accessibility as understood at W3C.
>
> A general part of W3C's claims about its technology is that they work for
> all people, regardless of disability - which in this case I believe one can
> reasonably interpret as "…including those who require signed captioning and
> other advanced potential sourceBuffers to be delivered to the
> HTMLMediaElement".


In my opinion, this is too strong of a claim for the W3C to make credibly.
It completely ignores the constraints of actual implementations. I believe
it is a great goal and we definitely should work towards enabling access in
any way that we can. I believe this is best done by first attacking this
problem at the HTMLMediaElement level. This could be inside an HTML spec or
a new extension spec. Either way, we need to define how the element deals
with these new use cases before we can determine how MSE needs to be
changed. I'm happy to update MSE when this behavior is defined, but until
then, I don't really think that such a note provides much value to
implementers or guidance for content authors.


>
>  Without arguing for the TF’s request I do want to point out they are only
>>> asking for the addition of a non-normative Note.
>>>
>>
>> I understand, but I don't think we need to add an informative note in MSE
>> indicating how multiple tracks would be useful. In my opinion this is a
>> quality of implementation issue and if implementations want to make MSE
>> content accessible then they will support more than the minimal
>> requirements.
>>
>
> It is normal for W3C specifications to support maximal accessibility "out
> of the box", since access for all is one of W3C's core values. A
> specification which did not do so and required special unexplained extra
> implementation to support basic accessibility use cases would be reasonably
> likely to attract formal objections.
>

HTML5 and/or HTML.next does not appear to support this "maximal
accessibility" right now. Are there formal objections for this? It seems to
me that supporting sign-language tracks is also an "unexplained extra
implementation" that doesn't appear to be defined anywhere. The note does
not appear to actually improve the situation.


> The proposed resolution of the Task Force assumes there will be shipping
> implementations incapable of supporting these use cases, but nevertheless
> useful in more restricted environments. It also assumes that there are
> people who expect to support accessible use cases by default, and indeed to
> look for solutions which do so as a matter of preference. It recognises
> this as a quality of implementation issue. It does not assume that the
> *only* way to provide high-quality accessibility is through the use of MSE.
> It merely requests that the specification acknowledge that a minimally
> conforming implementation may not satisfy certain use cases.
>

If I add a note along the lines of "The minimal requirement of 1 video and
1 audio track may not be sufficient to support accessibility use cases like
sign-language or audio description tracks.", how does this help? It may
cause people to think that these use cases could not be supported with
MSEon these restricted implementations, which is not true. You could
still use
MSE to display sign-language or alternate audio even if only one track of
each type is allowed. It seems like the "may" here leaves too much open to
interpretation and this note could end up simply being a lie and
prematurely scare people off. What is the goal here? How does this actually
improve accessibility?


>
> Indeed, the simplest method of satisfying it I can think of is adding a
> note on the addSourceBuffers method, after the definition of minimum
> requirements, pointing out that for some use cases, including
> accessibility-related ones such as signed video captioning, additional
> capability is necessary.


While I agree that this is the simplest and the likely path for consensus,
I worry that " additional capability is necessary" is not really helpful to
the reader. There are no references to specs that indicate what additional
capability is actually needed. Perhaps this could be outlined in extension
specs to MSE, but for now, I don't really see how this improves things.


>
>
>  I think people are reading too much into the 1 audio track and 1 video
>> track requirements. The primary purpose of these 2 bullet points were to
>> make sure that both "multiple tracks per SourceBuffer" and "multiple
>> SourceBuffers with a single track" must be supported by implementations.
>> The 1 track requirement is simply a reflection of the fact that many
>> devices will only be able to support these 2 configurations. Obviously
>> if a UA was able to support sign language video tracks, then they would
>> go beyond the minimal requirements.
>>
>
> It is not a priori obvious that a conforming implementation of a W3C
> Recommendation is unable to support basic use cases for accessibility. It
> is certainly not the general message that W3C promotes with regard to its
> specifications.
>

I feel like MSE is being held to a higher standard here just because it
expresses a reality that HTML5 doesn't fess up to. What if I modified the
text so it ignored this reality and only said something along the lines of:
 "If an implementation supports a specific combination of N tracks in a
single SourceBuffer, then it also must support the same N tracks
distributed across M SourceBuffers where M > 1 and  M <= N."

Would the need for the note go away? This would bring MSE into equal
vagueness with HTML5 on this issue and would not exclude your use cases. I
would prefer making this normative change instead of an informative change
that I believe would have little impact on making content more accessible.

Aaron



>
> cheers
>
> Chaals
>
> --
> Charles McCathie Nevile - Consultant (web standards) CTO Office, Yandex
>         chaals@yandex-team.ru         Find more at http://yandex.com
>
Received on Monday, 16 December 2013 21:23:25 UTC