W3C home > Mailing lists > Public > public-html-media@w3.org > December 2013

Re: Action-219: Draft Response to MSE on Bug 23661

From: Charles McCathie Nevile <chaals@yandex-team.ru>
Date: Tue, 17 Dec 2013 11:17:24 +0100
To: "Aaron Colwell" <acolwell@google.com>
Cc: "public-html-media@w3.org" <public-html-media@w3.org>
Message-ID: <op.w78eva1jy3oazb@chaals.local>
On Mon, 16 Dec 2013 22:22:56 +0100, Aaron Colwell <acolwell@google.com>  

> Comments inline..


I apologise for the length of this mail, especially as it seems to me to  
be repeatedly answering the same few questions phrased slightly  
differently. But I felt it was important not to leave something apparently  
unanswered just because it didn't seem to introduce anything new to me.  
Which has meant I haven't had the time to shorten this for easier  
readability. I believe that there is a meeting today at which this topic  
may come up, so I wanted to provide the input in time for that.

> On Mon, Dec 16, 2013 at 7:42 AM, Charles McCathie Nevile <chaals@yandex
>> On Thu, 12 Dec 2013 20:25:42 +0100, Aaron Colwell <acolwell@google.com>

> I too would like to find a good balance. I believe accessibility is
> important, but I want to make sure that the text we add to specs actually
> results in making things more accessible and not just be lip service.

Yes, I think we are on the same page with that.

> I could just agree right now and blindly add in the text but it wouldn't
> necessarily result in actual accessibility improvements in
> implementations nor clarify things for implementers that want to "do the
> right thing."

The Call for Consensus was framed about as open as I could figure it while  
keeping it clear what we were looking for, because I think that "blindly  
adding some text" would indeed not be the smart approach.

On the other hand, I think the accessibility community is compromising a  
fair bit to walk away from requirements that support practical implemented  
approaches, and instead looking for something that provides a reasonable  

>>  On Thu, Dec 12, 2013 at 10:28 AM, Paul Cotton  
>> <Paul.Cotton@microsoft.com>
>>  See the extract from the A11Y TF IRC log below in which I made some of
>>>> the points in your response during the A11Y TF discussion this
>>>> morning:
>>>> http://www.w3.org/2013/12/12-html-a11y-irc
>>> Thanks. I appreciate this.
>>>  Does HTML5 have a similar note?
>>>> The TF plans to open a bug on HTML5 to cause this to happen.
>>> Ok. That seems like the proper path forward to me.
>> If you look at the log, you will further note that the reason for  
>> raising this against MSE first is that MSE is likely to ship well
>> before HTML.
> So I feel like there are 2 parts to this.
> First, if this type of accessibility is a true core value of the W3C it
> seems like HTML should not be able to ship w/o this.

Indeed. I would expect very strong objection at the AC level if HTML  
simply ignored the use case. (After all, some people pay their membership  
fee specifically to work on accessibility in W3C - I can think of at least  
a dozen members where that is pretty close to their only reason for being  
in W3C...

> Based on the accessibility discussions I've observed during the ~2 years  
> I've been
> participating in the W3C, I know this is a contentious topic I don't  
> really want to rile people up again.

No. And I certainly don't want to go back to the bad old days of  
intractable "he-said-she-said" discussions that look suspiciously like the  
participants aim to score debating points rather than improve things for  

> Second, sign-language video tracks support has not been specified  
> anywhere to my knowledge so it is unclear what requirements this
> actually places on MSE.

As I understand the MSE spec, and Adrian's explanation of why 23661 "Works  
for Him", it already explains how to handle multiple sourceBuffers and  
doesn't constrain them *not* to be e.g. 2 or more video tracks.

Section 2.4.5 refers to the HTML5 spec's concept of "selected Video" but  
that apparently doesn't contradict in either MSE or HTML5 with the ability  
to use media controllers (as elaborated on below). In fact I couldn't find  
any practical impact of the selected video concept at all, beyond a DOM  
attribute that is only true for one video at a time.

> I understand that politics and the desire to ship likely prevents this  
> from being added to the HTML5 train, but if anything this should be  
> placed in an extension spec so that other specs like MSE can evaluate
> how to properly integrate with this functionality. It is not clear to me
> that a simple note saying that more than 1 video track needs to be
> supported to handle sign-language tracks is enough. At a minimum you'd
> need to specify how multiple video tracks being selected at one time
> should work since the current HTML5 text doesn't even allow it.

Actually, it is not only allowed but supported via the mediagroup element.  

At least one well-known source information about HTNL specifications  
[w3school] describes it claiming support in 5 browsers, and I tested it in  
mine (which is not one of the five) where it worked fine.

> That sort of information is required to properly update the algorithms
> in the MSE spec to support this use case.

If that were true, Adrian's "Works for Me" claim on the bug should be  
changed until we figure out if it does work. But it seems you're already  
doing far better than you claim, and that Adrian's claim is justifiable as  
a technical response.

> There are likely many other details that would need to be ironed out
> before it is clear how to properly enable support for this in MSE.

Having looked carefully, I cannot find any details in the algorithms that  
would need to be changed. Which accords with my understanding that this  
can already be supported in practice. Of course I am open to explanation  
of what specific things would not work in MSE, and looking at how to deal  
with such issues.

>>>>> I object to adding this note to the MSE spec. This is an attempt
>>>>> to give weight to an accessiblity issue that should be solved by
>>>>> the spec that defines HTMLMediaElement behavior (ie HTML5 &
>>>>> HTML.next) and not an extension spec that is simply providing an
>>>>> alternate way to supply media to the HTMLMediaElement.

>> Enabling a superior experience for users is a laudable goal. Indeed, it  
>> is also at the core of accessibility as understood at W3C.
>> A general part of W3C's claims about its technology is that they work  
>> for all people, regardless of disability - which in this case I believe
>> one can reasonably interpret as "…including those who require signed
>> captioning and other advanced potential sourceBuffers to be delivered
>> to the HTMLMediaElement".
> In my opinion, this is too strong of a claim for the W3C to make  
> credibly. It completely ignores the constraints of actual  
> implementations.

Does it? Your Google colleague outlined a strategy YouTube apparently  
proposes to ship and claimed that the code supported the objection. I  
outlined another strategy, implemented in running code in a real  
University. Adrian already claimed that this can be done, as did James  
Craig within the HTML Accessibility Task Force.

> I believe it is a great goal and we definitely should work towards
> enabling access in any way that we can. I believe this is best done by
> first attacking this problem at the HTMLMediaElement level. This could be
> inside an HTML spec or a new extension spec. Either way, we need to
> define how the element deals with these new use cases before we can
> determine how MSE needs to be changed.

Do you disagree that media controllers allow for this use case in the  
current HTML specification? (The current specification may not be ideal,  
but it appears to me that James and Adrian and W3Schools are correct and  
this can already be done).

> I'm happy to update MSE when this behavior is defined, but until
> then, I don't really think that such a note provides much value to
> implementers or guidance for content authors.

Until when? Was Adrian wrong to close the bug WfM, are others wrong in  
pointing out that HTML supports multiple videos through media controllers?

>> Without arguing for the TF’s request I do want to point out they are  
>> only asking for the addition of a non-normative Note.
>>> I understand, but I don't think we need to add an informative note in  
>>> MSE indicating how multiple tracks would be useful. In my opinion this
>>> is a quality of implementation issue and if implementations want to
>>> make MSE content accessible then they will support more than the
>>> minimal requirements.
>> It is normal for W3C specifications to support maximal accessibility  
>> "out of the box", since access for all is one of W3C's core values. A
>> specification which did not do so and required special unexplained extra
>> implementation to support basic accessibility use cases would be  
>> reasonably likely to attract formal objections.
> HTML5 and/or HTML.next does not appear to support this "maximal
> accessibility" right now. Are there formal objections for this?

No, and as explained I believe they are unnecessary because I believe your  
colleagues (rather than you) are correct and HTML *does* support this use  
case right now. If that really isn't the case I certainly expect formal  
objections to the HTML5 specification in due course.

(In the more general sense, yes there are outstanding objections against  
HTML5 where it does currently fail to support accessibility, but given  
that it is at a different stage of advancement that isn't actually  
relevant right now).

I have also seen the use case of multiple synchronised video tracks  
running in multiple different HTML systems, and while I realise that "a  
demo can work" isn't the same as "it really works" I am quite surprised to  
hear you contradicting Adrian's statement in particular.

If it is really true that this doesn't work, I think we should learn why,  
and importantly how others who are actually editing on the same spec come  
to different conclusions.

> It seems to me that supporting sign-language tracks is also an
> "unexplained extra implementation" that doesn't appear to be defined
> anywhere. The note does not appear to actually improve the situation.

I am working on the assumption that HTML5 does support multiple  
synchronised tracks.

>> The proposed resolution of the Task Force assumes there will be shipping
>> implementations incapable of supporting these use cases, but  
>> nevertheless useful in more restricted environments. It also assumes
>> that there are people who expect to support accessible use cases by
>> default, and indeed to look for solutions which do so as a matter of
>> preference. It recognises this as a quality of implementation issue.
>> It does not assume that the *only* way to provide high-quality
>> accessibility is through the use of MSE. It merely requests that the
>> specification acknowledge that a minimally conforming implementation
>> may not satisfy certain use cases.
> If I add a note along the lines of "The minimal requirement of 1 video  
> and 1 audio track may not be sufficient to support accessibility use
> cases like sign-language or audio description tracks.", how does this
> help? It may cause people to think that these use cases could not be
> supported with MSE on these restricted implementations, which is not
> true.

Agreed, although that would force partiular implementation strategies that  
I think it is unreasonable to assume are appropriate for all uses,  
particularly given the concrete evidence that implementors felt it  
necessary to support use cases that would be incredibly difficult to do  
with that approach.

> You could still use MSE to display sign-language or alternate audio
> even if only one track of each type is allowed. It seems like the
> "may" here leaves too much open to interpretation and this note could
> end up simply being a lie and prematurely scare people off.

Agreed. Getting the text correct is important, and I am happy to work on  
ensuring we do so, if you have accepted the request for some  
acknowledgement that the minimal configuration are not necessarily  
sufficient, and that higher quality implementations may better support use  
cases considered important.

> What is the goal here?

To ensure that those reading the spec (e.g. implementors, people who are  
basing purchase requirements on it, and people who are using it as a  
source teaching others) are aware that some use cases are likely to  
require in practice the "higher quality" implementations that offer  
configurations beyond the minimum requirement listed for conformance to  
the specification.

> How does this actually improve accessibility?

By avoiding the mistake people often fall into of *assuming* that  
following the spec equates to getting everything they want. Setting the  
bar for conformance to the specification so low that it automatically  
excludes certain legitimate use cases (I assume you are not arguing that  
the use cases are not legitimate, since you haven't so far suggested that)  
suggests that a responsible thing to do is to clarify in the spec that  
this is the case.

>> Indeed, the simplest method of satisfying it I can think of is adding a
>> note on the addSourceBuffers method, after the definition of minimum
>> requirements, pointing out that for some use cases, including
>> accessibility-related ones such as signed video captioning, additional
>> capability is necessary.
> While I agree that this is the simplest and the likely path for  
> consensus, I worry that " additional capability is necessary" is not
> really helpful to the reader. There are no references to specs that
> indicate what additional capability is actually needed.

I expect the editors, being very familiar with the specification, could  
help us improve any suggestion to make sure it is true and includes any  
necessary pointers or qualifications.

> Perhaps this could be outlined in extension specs to MSE, but for
> now, I don't really see how this improves things.

I don't think an extension specification to MSE is necessary, and I think  
there is good reason to believe that resolving the discussion that way  
will lead to a very expensive and poor quality outcome.

>>  I think people are reading too much into the 1 audio track and 1 video
>>> track requirements. The primary purpose of these 2 bullet points were  
>>> to make sure that both "multiple tracks per SourceBuffer" and"multiple  
>>> SourceBuffers with a single track" must be supported by  
>>> implementations.
>>> The 1 track requirement is simply a reflection of the fact that many
>>> devices will only be able to support these 2 configurations.

For some value of "many".

Rather than arguing whether such essentially legacy devices conform to the  
requirements for a modern web, we're not trying to change the requirement.

But it is one thing to suggest that Opera Mini is a useful tool for  
accessing the Web (it is, for zillions of users) and another to say "so it  
does everything you need on the web, with a feature phone" (which might  
appear in press releases, but is demonstrably misleading in normal  

We're asking for the specification to err on the side of being a  
technically clear document, rather than a PR piece.

At the same time, we are asking to work with the editors and Working Group  
to ensure that relevant statements describing limitations introduced or  
implied by the spec are indeed accurate.

>>> Obviously if a UA was able to support sign language video tracks, then
>>> they would go beyond the minimal requirements.
>> It is not a priori obvious that a conforming implementation of a W3C
>> Recommendation is unable to support basic use cases for accessibility.  
>> It is certainly not the general message that W3C promotes with regard  
>> to its specifications.
> I feel like MSE is being held to a higher standard here just because it
> expresses a reality that HTML5 doesn't fess up to.

I don't think so. And as I note, if I am mistaken and HTML5 really doesn't  
support the use cases I don't think you should assume that it won't be  
held to the same standard.

> What if I modified the text so it ignored this reality and only said
> something along the lines of:
>  "If an implementation supports a specific combination of N tracks in a
> single SourceBuffer, then it also must support the same N tracks
> distributed across M SourceBuffers where M > 1 and  M <= N."

To be honest, my likely reaction to such a change would be to become  
somewhat less confident about your willingness to look collaboratively for  
sensible solutions to identified problems which you profess to want to  

> Would the need for the note go away?

I don't think so. Although it makes the spec far harder for anyone to  
understand, with no apparent benefit to anyone, it *seems* to me on first  
read that tt effectively doesn't support the use case, nor acknowledge  
that the use case is insupported.

> This would bring MSE into equal vagueness with HTML5 on this issue
> and would not exclude your use cases.

As stated earlier, I believe HTML5 does actually support the use cases,  
and that this assertion would therefore be untrue.

> I would prefer making this normative change instead of an informative  
> change that I believe would have little impact on making content more  
> accessible.

I hope you decide that retreating into obscurity is not a good approach to  
dealing with issues.

I do not want to insist on a particular text that is less than honest, nor  
one that is sufficiently unclear that it is effectively misleading. I want  
to work constructively to help clarify to users of the MSE specification  
what is required to support use cases that I believe it and HTML do  
actually support.

[w3school] http://www.w3schools.com/TAGS/av_prop_mediagroup.asp



Charles McCathie Nevile - Consultant (web standards) CTO Office, Yandex
       chaals@yandex-team.ru         Find more at http://yandex.com
Received on Tuesday, 17 December 2013 10:18:01 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 15:48:43 UTC