Re: Action-219: Draft Response to MSE on Bug 23661

On Thu, 12 Dec 2013 20:50:33 +0100, Steven Robertson <strobe@google.com>
wrote:

> YouTube intends to use MSE to improve accessibility by offering the user  
> the ability to switch content streams on the fly. Currently we do *not*  
> plan to implement this using AudioTracks/VideoTracks but rather by  
> switching the streams that get appended using JS. We are doing this so  
> that the feature can work reliably across all devices, including those  
> which lack the technical capability to support A/VTracks.

Right, this is one valid implementation strategy. Note that it requires  
you to hold and mix on the server-side all the necessary pieces, which  
goes beyond providing the simple adaptations typically required to make  
MSE worth using.

Note that signing video being developed by LaTrobe University (which  
currently has the largest deaf student population in Australia according  
to their own statements) to support hearing-impaired students is designed  
to meet the use case of allowing the student to reposition the two video  
tracks relative to each other, for example swapping video-in-video display  
for either video to be the "container", as well as moving the smaller  
video around on screen to cater for shifts in the visually important areas  
at any time. It seems the YouTube proposal would be unable to support  
these requirements efficiently, requiring a large amount of re-encoding to  
support a relatively small but mission-critical (for the University)  
audience.

> This supports Aaron's objection;

I don't think so.

> conflating the user-facing goal of improving accessibility by having
> the ability to select more appropriate content and one technical
> implementation of that goal will limit the adoption of other
> strategies

It may well do so, and I agree that it would be a mistake to make such a  
conflation. For example by assuming that the approach taken by YouTube is  
available to and appropriate for everyone else - or by assuming that  
signed captioning can only be done with a second independent video track.

(A third obvious technique is to use avatars, and transmit something that  
is not video at all over the wire - but while this has been considered a  
great idea for at least two decades, it's still in the "nice demo, but not  
really generally usable" stage as far as I know).

> which have broader support.

That's a judgement call that relies on a number of assumptions. Tto be  
fair, so is the word "typically" in the proposed resolution of the TF, and  
on reflection it seems that it should be easy to remove any bias toward  
one or other assumption in a result acceptable to all.

But even if it turns out that we can prove one solution has and will have  
broader support, failing to adequately support legitimate alternative  
implementation strategies should be justified on some technical grounds.  
As you note above, assuming that a solution which works for many cases is  
the right one for everyone else too is a path to making specifications  
that unfairly distort perceptions about use cases that are appropriate to  
support.

Please note that I am not accusing you of actually making that assumption,  
but it seems to me that upholding your objection would lead people to  
think in that direction.

cheers

Chaals

-- 
Charles McCathie Nevile - Consultant (web standards) CTO Office, Yandex
         chaals@yandex-team.ru         Find more at http://yandex.com

Received on Monday, 16 December 2013 16:16:58 UTC