Re: EME impact on accessibility

Hi Alastair,

That's all huh? Apologies for the following rather lengthy (and
occasionally opinionated) response.

*TL;DR:*

Having had a ring-side seat and as an active participant in much of this
debate, here's my biased perspective: despite ongoing claims that EME
impacts accessibility, we've not seen any actual proof of that, despite, as
you note, the fact that EME has been in browsers for a couple of years now
- surely if there were issues concerning accessibility and EME we'd hear of
them by now, right? Especially if we've been actively listening for that
particular "ping from the cosmos" (as I have been).

And what I have heard is {{crickets}}

**********

Breaking down your questions with what *I* know (and I freely admit I
likely don't know it all, but I've been very active here, so...)

>           *Captions.*

If captions are available they must be un-encrypted, so there shouldn’t be
an issue there.

Pretty much. We have two specific scenarios, 1) support materials are
provided "out-of-band", and 2) support materials are provided "in-band".

*In-band:*

What this means (in case you or anyone else reading this is unaware) is
that file formats such as MP4 and MKV are actually "wrapper" formats, used
to contain media content and related materials. Think of them as similar to
CAB files, or even ZIP files, where the container travels across  the web
as a complete and single entity, and then expanded or "opened" at the
user-end.

So an MP4 file can contain an H.264 encoded video, AAC encoded audio, and,
if desired during post-production, additional files (such as TTML/WebVTT
files, and/or other manifest files, support files, etc.) can also be
included in the "wrapper".

As part of the effort around HTML5's <video> element, there was also an API
developed that allows browsers (user agents) to open and extract "track"
files from the container wrapper (
https://www.w3.org/TR/html5/embedded-content-0.html#audiotracklist-and-videotracklist-objects)
although AFAICT it only has support in Apple products today (not
surprisingly, as I recall the API was developed by Eric Carlson, an Apple
engineer).

In this scenario, the EME spec states:

*Unencrypted In-band Support Content*

In-band support content, such as captions, described audio, and
transcripts, should not be encrypted.
NOTE

Decryption of such tracks - especially such that they can be provided back
the user agent - is not generally supported by implementations. Thus,
encrypting such tracks would prevent them from being widely available for
use with accessibility features in user agent implementations.


 After asking about this, my understanding is this: the "MP4" file can be
encrypted and decrypted via the EME API, however all content inside the
wrapper must be unecrypted. Since content "entering" a browser environment
(user agent) will first decrypt the content "at the door", the opened or
expanded wrapper container will actually provide content to the end user -
unencrypted. In other words, once you have access to the content inside the
wrapper file, none of that content is further blocked from user
interaction. Failing to have legal access to the content inside of the MP4
means *all* content is blocked to the end user - disabled or otherwise. On
the other hand, if you have a legal right to access the content inside the
MP4 wrapper, you have access to *ALL* of it, including the support
materials.



*Out-of-band:*
Now the other means of providing support materials is "out-of-band", which
I think most folks conceptually understand, as this is the use-case for
introducing the <track> child element of <video>.

In this scenario, support files are referenced via the <track> element, and
those files are delivered to the user agent independently, in a fashion
similar to how .jpg or other graphic files aren't "embedded" (OLE) in web
pages, but rather are referenced by the code, and the referenced file
travels over the net as a discrete file. In this scenario, the EME spec
states:

Implementations that choose to support encrypted support content must provide
the decrypted data to the user agent to be processed in the same way as
equivalent unencrypted timed text tracks
<https://www.w3.org/TR/html51/semantics-embedded-content.html#timed-text-tracks>
.

...and so, by design, EME allows (demands?) for unencrypted text tracks in
the user-agent (browser). In other words, all of the decryption happens
before the content even renders in the browser, and once rendered in the
browser, the end-user can interact with that content "unimpeded" (with the
exception that the streamed content can only be "viewed" and not saved.)
Remember, EME is for *streaming video* only, and cannot be re-purposed for
other uses today (AFAIK):

"This proposal extends HTMLMediaElement
<https://www.w3.org/TR/html51/semantics-embedded-content.html#htmlmediaelement-htmlmediaelement>
 [HTML51 <https://www.w3.org/TR/encrypted-media/#bib-HTML51>] providing
APIs to control playback of encrypted content.
​"​ (
https://www.w3.org/TR/encrypted-media/
​)​

​************

Continuing with your other questions/scenarios:​


*Audio description.*
I assume audio-description would simply be a separate audio stream or
separate video, I don’t see an issue there.


Essentially covered by the same logic and principles as
​ captions/sub-titles, but only a different file format. Video description
can be provided via a separate audio stream​ included inside the wrapper
format or referenced using the <audio> element and "slaved" to the video;
conversely 'text' descriptions (a relatively new possibility) which are
then 'processed' by TTS engines can also be included inside the wrapper
format (in-band) or referenced via the child <track> element of <video>
(out-of-band). (This also holds true for Transcripts BTW...)

​Audio descriptions can be provided, either as a separate track embedded in
the video stream, or a separate audio track in an audio
<https://www.w3.org/TR/html5/embedded-content-0.html#the-audio-element>
 element slaved
<https://www.w3.org/TR/html5/embedded-content-0.html#slaved-media-elements> to
the same controller as the video
<https://www.w3.org/TR/html5/embedded-content-0.html#the-video-element>
element(s),
or in text form using a WebVTT file
<https://www.w3.org/TR/html5/infrastructure.html#webvtt-file> referenced
using the track
<https://www.w3.org/TR/html5/embedded-content-0.html#the-track-element> element
and synthesized into speech by the user agent
(https://www.w3.org/TR/html5/embedded-content-0.html#the-video-element)​



​*Enlargement of content.*
I’m not sure how this is affected. The video is encrypted, but I believe
that its size can be adjusted within a page. Captions and the timings that
drive them are not encrypted so should not be affected by EME.​

​Enlargement (vaguely defined here) would be a function of the browser, and
enacted/provided post decryption by the browser.

EME is the "greeter" at the front door of the browser - once you clear EME
(i.e. you are 'authenticated/cleared' to view the protected premium
content)​, EME then "gets out of the way" and allows browsers to do what
they do. (A very quick check with one browser and one source - Netflix in
Chrome on Windows - confirms to me that I cannot "zoom" or enlarge the
content on my screen - but then again, I can't do that on my TV either...).
Bottom line: this appears to be a constraint of the browser, and not
introduced nor impeded by EME (but I suspect more testing would be required
there to categorically prove or disprove the assertion).



*Auto captioning of the audio stream.*So encrypting the video & audio would
(theoretically at least) prevent a 3rd party from running auto-captioning
software on the audio.

However, the companies with the capability to do that (Youtube, Microsoft,
Amazon etc) are very closely correlated with the companies applying the
DRM. Would this be an issue in practice? Presumably the responsibility for
providing captions is on the provider who has the non-encrypted copy,
therefore they are not prevented from auto-captioning?


​This is an interesting use-case. In principle, I suppose that "agents" (be
they humans or APIs) not authorized to consume content will not be able to
perform functions like this. However, as you note, this only means that the
content creator is otherwise obligated to provide the captions to remain
"lawful" w.r.t. providing accessibility support of video content.

EME was conceived primarily to protect "premium content" (i.e. commercially
produced entertainment content)​, and while in theory it could be applied
to *all* video content, there is a cost/benefit ratio involved that acts as
a bit of a filter (access to the CDM - Content Decryption Module - is a
licensed activity, and has a cost associated to it borne by the content
owner). Additionally, while speech-to-text continues to improve at a
near-daily pace, as accessibility professionals we know that the accuracy
of this technology today is less then ideal.

Finally, by logic (but untested), it would seem that once you have
satisfied the "right to consume" requirement that the DRM imposes (and is
processed via the EME API), that all content is then "unencumbered" by the
encryption when rendered in the browser, so in theory at least the audio
could then be "listened to" and converted to text. Alastair, are you aware
of any actual instances where this has proven to be an issue?




*Facial recognition.*I’m not entirely sure what the purpose of this would
be, identifying people/actors/characters as they come and go? I can tell
Amazon already has that information as meta-data for their videos as the
interface can show you who is in the scene. I suspect they add that with a
more manual process though, as it doesn’t match whether the face is on
screen or not, just whether they are in the scene.
Theoretically this would prevent 3rd party access to facial recognition,
but is it something that would be the responsibility of the provider
anyway? Not sure.


Again... to me, this just seems to be grasping at straws.

I suspect you'd have to construct a very complex use-case to show how this
specifically and explicitly was an "accessibility issue". I'd be happy to
hear that tap-dance however, but I cannot envision one myself. Does anyone
here know of a software tool or accessibility requirement that is dependent
on facial recognition? (Frankly, my opinion is that anti-EME proponents
will throw anything and everything against the wall because they just
fundamentally disagree with the premise which spawned EME in the first
place, which is: Premium content owners are permitted by law to restrict
and control access to digital files they have invested in and created, as
part of a for-profit enterprise. I'm very much a Free as in speech, but not
as in beer kind of guy).



*Color filtering.*On iOS (at least) colour filtering can be done at the
hardware level, and if you have colour issues then presumably you’d want it
on all the time, not than just videos?


(Also referenced as "Daltonization" by Corey Doctorow and others).

At first, this seemed to be a potential "Did we miss this?" question at the
APA WG (and among the participants of the Media Accessibility Task Force
who created the MAUR). As a well known "anti-Apple" kind of guy, I cannot
speak to the iOS mechanism, but I did do some testing last summer around
this concern. As you note, PwD who require specialized color palettes to
meet their visual impairments, will likely require this for *all* content
consumed, and not just Premium video content.

I knew that ZoomText allowed for user-specified color palettes in the
browser, and so I again went to Netflix, launched a video, and then
"applied" a customized color palette via ZoomText. Sure enough, it "worked"
(were worked = the visual interface was modified by the software to provide
the 'required' or specified color modifications). These changes were
applied to both the "chrome" (user controls) as well as the content
rendered in the view-port of the video player - even when I went "full
screen". (I am unsure of *how* ZoomText achieves this, but it appeared that
an overlay filter of sorts was invoked, as when I attempted to do a screen
capture, the capture "lost" the colorization - I had to take a photo of the
screen with the colorization as "proof")

And so, based upon the following user-story/requirement ("As a person with
visual impairments, I need to be able to modify the color palette of
content in my browser window to those that meet my needs"), I was able to
demonstrate that I was able to meet that requirement. Whether or not this
is the same with the "hardware" solution provided by iOS I am unsure, but
at this time I would chalk that up to an issue with the user-agent, and not
because of EME per-se (because I was able to successfully address the
user-story/requirement using software on my rig).

Do we need more testing and investigation here? Likely, and there is an
effort inside of the W3C to continue to do this type of testing, and
gathering of data. (If you are interested in being involved in that effort,
ping me and all help gratefully accepted.)

**********

<rant>
I am fed-up, up-to-here, with anti-EME proponents playing the scary
"accessibility" card for political gain, without spending the time or
effort supporting their claims.

They are relying not on logic or evidence, but rather on
non-accessibility-experts 'fear' that they may run afoul of the law with
regard to digital content. W3C protocol 'forbids' me from casting
aspersions on specific fellow W3C colleagues, but it is my personal opinion
that many of the more vocal EME opponents really don't care that much about
PwD's needs on the web, but rather simply see that this is but an easy and
simple means of casting doubt and confusion around EME, because they don't
like the politics of it. We then see others echo "accessibility concerns"
without specifics in their responses as a reason to not advance the EME API
Spec at the W3C.

That angers me to no end!   It trivializes and politicizes the real issues
and problems PwD experience on the web today - without once providing
evidence that EME has a negative impact on those people. It "sounds" bad,
ergo it must be bad.

Bull feathers!!!
</rant>

JF

On Wed, Apr 5, 2017 at 8:22 AM, Alastair Campbell <acampbell@nomensa.com>
wrote:

> Hi everyone,
>
>
>
> I’m trying to get some information to make a choice without getting into a
> bun-fight on a contentious topic. I’d like to get to the facts of the
> situation without talking about the good/bad of EME in general, so please
> bare that in mind.
>
>
>
> *Background:*
>
> The W3C has “Encrypted Media Extensions” [1] at Proposed Recommendation
> stage, the spec that defines the API from the browser to a DRM module.
> Several W3C members are objecting to it on the grounds of the impact is has
> on security and accessibility.
>
>
>
> *Questions:*
>
> What I’d like to focus on is the theoretical and practical implications
> for accessibility. For example, from my reading:
>
>
>
> -
> *Captions. *If captions are available they must be un-encrypted, so there
> shouldn’t be an issue there.
>
> -          *Audio description.*
> I assume audio-description would simply be a separate audio stream or
> separate video, I don’t see an issue there.
>
>
>
> Other items raised by people to do with accessibility are as follows, with
> my own comments under the item:
>
>
>
> -          *Enlargement of content.*
> I’m not sure how this is affected. The video is encrypted, but I believe
> that its size can be adjusted within a page. Captions and the timings that
> drive them are not encrypted so should not be affected by EME.
>
> -
> *Auto captioning of the audio stream. *So encrypting the video & audio
> would (theoretically at least) prevent a 3rd party from running
> auto-captioning software on the audio.
>
> However, the companies with the capability to do that (Youtube, Microsoft,
> Amazon etc) are very closely correlated with the companies applying the
> DRM. Would this be an issue in practice? Presumably the responsibility for
> providing captions is on the provider who has the non-encrypted copy,
> therefore they are not prevented from auto-captioning?
>
> -
> *Facial recognition. *I’m not entirely sure what the purpose of this
> would be, identifying people/actors/characters as they come and go? I can
> tell Amazon already has that information as meta-data for their videos as
> the interface can show you who is in the scene. I suspect they add that
> with a more manual process though, as it doesn’t match whether the face is
> on screen or not, just whether they are in the scene.
> Theoretically this would prevent 3rd party access to facial recognition,
> but is it something that would be the responsibility of the provider
> anyway? Not sure.
>
> -
> *Color filtering. *On iOS (at least) colour filtering can be done at the
> hardware level, and if you have colour issues then presumably you’d want it
> on all the time, not than just videos?
>
>
>
> Given that EME has been implemented in browsers for several years, the
> question is whether the W3C blesses the spec, and I’d like some solid
> information on the accessibility aspects before commenting.
>
>
>
> Kind regards,
>
>
>
> -Alastair
>
>
>
> 1] https://www.w3.org/TR/2017/PR-encrypted-media-20170316/
>
>
>



-- 
John Foliot
Principal Accessibility Strategist
Deque Systems Inc.
john.foliot@deque.com

Advancing the mission of digital accessibility and inclusion

Received on Wednesday, 5 April 2017 16:11:08 UTC