Re: EME impact on accessibility from Judy Brewer on 2017-04-05 (w3c-wai-gl@w3.org from April to June 2017)

From: Judy Brewer <jbrewer@w3.org>
Date: Wed, 5 Apr 2017 12:30:12 -0400
To: John Foliot <john.foliot@deque.com>, Alastair Campbell <acampbell@nomensa.com>
Cc: WCAG <w3c-wai-gl@w3.org>, Jonathan Avila <jon.avila@ssbbartgroup.com>
Message-ID: <a97eb5b8-7281-e0e0-25ff-4e418545f9dd@w3.org>
Hi Alastair and All,

FYI, there has been review on these and other potential concerns about 
EME and accessibility in several threads over the months and years.

Team-side, we've been pulling these together into a summary that I hope 
will be available either later today, otherwise tomorrow.

A quick summary for now is that from a perspective of possible blocks to 
research on accessibility adaptations, for every case that we've looked 
at, research on these kinds of adaptations is best done in open video 
streams in any case, and such research is not blocked in any way by the 
existence, or not, of some form of encryption. And when it comes to 
application of such adaptations, in every case we're looked at so far, 
adaptations can be handled the same way that one ordinarily handles 
passing information back and forth by agreements. By the way for more 
traditional types of accessibility information (captions, descriptions, 
transcripts) in practice those are generally available in the clear in 
any case, regardless of any encryption, and that's also exactly what the 
spec calls for; and for the infrequent exceptions (for instance, open 
captions that are burned into a primary video object), those are 
decrypted along with the video.

I'm updating a summary document of this kind info and will bounce a copy 
of that to this list (among others) as soon as available, in hopes that 
people don't unnecessarily burn more ink on this. I'd continue to be 
interested in any questions or concerns once that updated summary is 
available -- and appreciate your keep us updated on any concerns.

Thanks,

- Judy


On 4/5/2017 12:10 PM, John Foliot wrote:
> Hi Alastair,
>
> That's all huh? Apologies for the following rather lengthy (and 
> occasionally opinionated) response.
>
> *TL;DR:*
>
> Having had a ring-side seat and as an active participant in much of 
> this debate, here's my biased perspective: despite ongoing claims that 
> EME impacts accessibility, we've not seen any actual proof of that, 
> despite, as you note, the fact that EME has been in browsers for a 
> couple of years now - surely if there were issues concerning 
> accessibility and EME we'd hear of them by now, right? Especially if 
> we've been actively listening for that particular "ping from the 
> cosmos" (as I have been).
>
> And what I have heard is {{crickets}}
>
> **********
>
> Breaking down your questions with what *I* know (and I freely admit I 
> likely don't know it all, but I've been very active here, so...)
>
> > *Captions.*
>
> If captions are available they must be un-encrypted, so there 
> shouldn’t be an issue there.
>
>
> Pretty much. We have two specific scenarios, 1) support materials are 
> provided "out-of-band", and 2) support materials are provided "in-band".
>
> *In-band:*
>
> What this means (in case you or anyone else reading this is unaware) 
> is that file formats such as MP4 and MKV are actually "wrapper" 
> formats, used to contain media content and related materials. Think of 
> them as similar to CAB files, or even ZIP files, where the container 
> travels across  the web as a complete and single entity, and then 
> expanded or "opened" at the user-end.
>
> So an MP4 file can contain an H.264 encoded video, AAC encoded audio, 
> and, if desired during post-production, additional files (such as 
> TTML/WebVTT files, and/or other manifest files, support files, etc.) 
> can also be included in the "wrapper".
>
> As part of the effort around HTML5's <video> element, there was also 
> an API developed that allows browsers (user agents) to open and 
> extract "track" files from the container wrapper 
> (https://www.w3.org/TR/html5/embedded-content-0.html#audiotracklist-and-videotracklist-objects) 
> although AFAICT it only has support in Apple products today (not 
> surprisingly, as I recall the API was developed by Eric Carlson, an 
> Apple engineer).
>
> In this scenario, the EME spec states:
>
>     *Unencrypted In-band Support Content*
>
>     In-band support content, such as captions, described audio, and
>     transcripts, should not be encrypted.
>
>     NOTE
>
>     Decryption of such tracks - especially such that they can be
>     provided back the user agent - is not generally supported by
>     implementations. Thus, encrypting such tracks would prevent them
>     from being widely available for use with accessibility features in
>     user agent implementations.
>
>
>  After asking about this, my understanding is this: the "MP4" file can 
> be encrypted and decrypted via the EME API, however all content inside 
> the wrapper must be unecrypted. Since content "entering" a browser 
> environment (user agent) will first decrypt the content "at the door", 
> the opened or expanded wrapper container will actually provide content 
> to the end user - unencrypted. In other words, once you have access to 
> the content inside the wrapper file, none of that content is further 
> blocked from user interaction. Failing to have legal access to the 
> content inside of the MP4 means *all* content is blocked to the end 
> user - disabled or otherwise. On the other hand, if you have a legal 
> right to access the content inside the MP4 wrapper, you have access to 
> *ALL* of it, including the support materials.
>
> *Out-of-band:
>
> *
> Now the other means of providing support materials is "out-of-band", 
> which I think most folks conceptually understand, as this is the 
> use-case for introducing the <track> child element of <video>.
>
> In this scenario, support files are referenced via the <track> 
> element, and those files are delivered to the user agent 
> independently, in a fashion similar to how .jpg or other graphic files 
> aren't "embedded" (OLE) in web pages, but rather are referenced by the 
> code, and the referenced file travels over the net as a discrete file. 
> In this scenario, the EME spec states:
>
>     Implementations that choose to support encrypted support content
>     must provide the decrypted data to the user agent to be processed
>     in the same way as equivalent unencrypted |timed text tracks
>     <https://www.w3.org/TR/html51/semantics-embedded-content.html#timed-text-tracks>|.
>
> ...and so, by design, EME allows (demands?) for unencrypted text 
> tracks in the user-agent (browser). In other words, all of the 
> decryption happens before the content even renders in the browser, and 
> once rendered in the browser, the end-user can interact with that 
> content "unimpeded" (with the exception that the streamed content can 
> only be "viewed" and not saved.) Remember, EME is for *streaming 
> video* only, and cannot be re-purposed for other uses today (AFAIK):
>
>     "This proposal extends |HTMLMediaElement
>     <https://www.w3.org/TR/html51/semantics-embedded-content.html#htmlmediaelement-htmlmediaelement>| [HTML51
>     <https://www.w3.org/TR/encrypted-media/#bib-HTML51>] providing
>     APIs to control playback of encrypted content.
>     " (
>     https://www.w3.org/TR/encrypted-media/
>     )
>
> ************
>
> Continuing with your other questions/scenarios:
>
>
>     *Audio description.*
>     I assume audio-description would simply be a separate audio stream
>     or separate video, I don’t see an issue there.
>
>
> Essentially covered by the same logic and principles as
>  captions/sub-titles, but only a different file format. Video 
> description can be provided via a separate audio stream included 
> inside the wrapper format or referenced using the <audio> element and 
> "slaved" to the video; conversely 'text' descriptions (a relatively 
> new possibility) which are then 'processed' by TTS engines can also be 
> included inside the wrapper format (in-band) or referenced via the 
> child <track> element of <video> (out-of-band). (This also holds true 
> for Transcripts BTW...)
>
>     Audio descriptions can be provided, either as a separate track
>     embedded in the video stream, or a separate audio track in an
>     |audio
>     <https://www.w3.org/TR/html5/embedded-content-0.html#the-audio-element>| element
>     slaved
>     <https://www.w3.org/TR/html5/embedded-content-0.html#slaved-media-elements> to
>     the same controller as the |video
>     <https://www.w3.org/TR/html5/embedded-content-0.html#the-video-element>| element(s),
>     or in text form using a WebVTT file
>     <https://www.w3.org/TR/html5/infrastructure.html#webvtt-file> referenced
>     using the |track
>     <https://www.w3.org/TR/html5/embedded-content-0.html#the-track-element>| element
>     and synthesized into speech by the user agent
>     (https://www.w3.org/TR/html5/embedded-content-0.html#the-video-element)
>
>
>
>     *Enlargement of content.*
>     I’m not sure how this is affected. The video is encrypted, but I
>     believe that its size can be adjusted within a page. Captions and
>     the timings that drive them are not encrypted so should not be
>     affected by EME.
>
> Enlargement (vaguely defined here) would be a function of the 
> browser, and enacted/provided post decryption by the browser.
>
> EME is the "greeter" at the front door of the browser - once you clear 
> EME (i.e. you are 'authenticated/cleared' to view the protected 
> premium content), EME then "gets out of the way" and allows browsers 
> to do what they do. (A very quick check with one browser and one 
> source - Netflix in Chrome on Windows - confirms to me that I cannot 
> "zoom" or enlarge the content on my screen - but then again, I can't 
> do that on my TV either...). Bottom line: this appears to be a 
> constraint of the browser, and not introduced nor impeded by EME (but 
> I suspect more testing would be required there to categorically prove 
> or disprove the assertion).
>
>
>     *Auto captioning of the audio stream.
>     *So encrypting the video & audio would (theoretically at least)
>     prevent a 3^rd  party from running auto-captioning software on the
>     audio.
>
>     However, the companies with the capability to do that (Youtube,
>     Microsoft, Amazon etc) are very closely correlated with the
>     companies applying the DRM. Would this be an issue in practice?
>     Presumably the responsibility for providing captions is on the
>     provider who has the non-encrypted copy, therefore they are not
>     prevented from auto-captioning?
>
>
> This is an interesting use-case. In principle, I suppose that 
> "agents" (be they humans or APIs) not authorized to consume content 
> will not be able to perform functions like this. However, as you note, 
> this only means that the content creator is otherwise obligated to 
> provide the captions to remain "lawful" w.r.t. providing accessibility 
> support of video content.
>
> EME was conceived primarily to protect "premium content" (i.e. 
> commercially produced entertainment content), and while in theory it 
> could be applied to *all* video content, there is a cost/benefit ratio 
> involved that acts as a bit of a filter (access to the CDM - Content 
> Decryption Module - is a licensed activity, and has a cost associated 
> to it borne by the content owner). Additionally, while speech-to-text 
> continues to improve at a near-daily pace, as accessibility 
> professionals we know that the accuracy of this technology today is 
> less then ideal.
>
> Finally, by logic (but untested), it would seem that once you have 
> satisfied the "right to consume" requirement that the DRM imposes (and 
> is processed via the EME API), that all content is then "unencumbered" 
> by the encryption when rendered in the browser, so in theory at least 
> the audio could then be "listened to" and converted to text. Alastair, 
> are you aware of any actual instances where this has proven to be an 
> issue?
>
>
>
>     *Facial recognition.
>     *I’m not entirely sure what the purpose of this would be,
>     identifying people/actors/characters as they come and go? I can
>     tell Amazon already has that information as meta-data for their
>     videos as the interface can show you who is in the scene. I
>     suspect they add that with a more manual process though, as it
>     doesn’t match whether the face is on screen or not, just whether
>     they are in the scene.
>     Theoretically this would prevent 3^rd  party access to facial
>     recognition, but is it something that would be the responsibility
>     of the provider anyway? Not sure.
>
>
> Again... to me, this just seems to be grasping at straws.
>
> I suspect you'd have to construct a very complex use-case to show how 
> this specifically and explicitly was an "accessibility issue". I'd be 
> happy to hear that tap-dance however, but I cannot envision one 
> myself. Does anyone here know of a software tool or accessibility 
> requirement that is dependent on facial recognition? (Frankly, my 
> opinion is that anti-EME proponents will throw anything and everything 
> against the wall because they just fundamentally disagree with the 
> premise which spawned EME in the first place, which is: Premium 
> content owners are permitted by law to restrict and control access to 
> digital files they have invested in and created, as part of a 
> for-profit enterprise. I'm very much a Free as in speech, but not as 
> in beer kind of guy).
>
>
>     *Color filtering.
>     *On iOS (at least) colour filtering can be done at the hardware
>     level, and if you have colour issues then presumably you’d want it
>     on all the time, not than just videos?
>
>
> (Also referenced as "Daltonization" by Corey Doctorow and others).
>
> At first, this seemed to be a potential "Did we miss this?" question 
> at the APA WG (and among the participants of the Media Accessibility 
> Task Force who created the MAUR). As a well known "anti-Apple" kind of 
> guy, I cannot speak to the iOS mechanism, but I did do some testing 
> last summer around this concern. As you note, PwD who require 
> specialized color palettes to meet their visual impairments, will 
> likely require this for *all* content consumed, and not just Premium 
> video content.
>
> I knew that ZoomText allowed for user-specified color palettes in the 
> browser, and so I again went to Netflix, launched a video, and then 
> "applied" a customized color palette via ZoomText. Sure enough, it 
> "worked" (were worked = the visual interface was modified by the 
> software to provide the 'required' or specified color modifications). 
> These changes were applied to both the "chrome" (user controls) as 
> well as the content rendered in the view-port of the video player - 
> even when I went "full screen". (I am unsure of *how* ZoomText 
> achieves this, but it appeared that an overlay filter of sorts was 
> invoked, as when I attempted to do a screen capture, the capture 
> "lost" the colorization - I had to take a photo of the screen with the 
> colorization as "proof")
>
> And so, based upon the following user-story/requirement ("As a person 
> with visual impairments, I need to be able to modify the color palette 
> of content in my browser window to those that meet my needs"), I was 
> able to demonstrate that I was able to meet that requirement. Whether 
> or not this is the same with the "hardware" solution provided by iOS I 
> am unsure, but at this time I would chalk that up to an issue with the 
> user-agent, and not because of EME per-se (because I was able to 
> successfully address the user-story/requirement using software on my rig).
>
> Do we need more testing and investigation here? Likely, and there is 
> an effort inside of the W3C to continue to do this type of testing, 
> and gathering of data. (If you are interested in being involved in 
> that effort, ping me and all help gratefully accepted.)
>
> **********
>
> <rant>
> I am fed-up, up-to-here, with anti-EME proponents playing the scary 
> "accessibility" card for political gain, without spending the time or 
> effort supporting their claims.
>
> They are relying not on logic or evidence, but rather on 
> non-accessibility-experts 'fear' that they may run afoul of the law 
> with regard to digital content. W3C protocol 'forbids' me from casting 
> aspersions on specific fellow W3C colleagues, but it is my personal 
> opinion that many of the more vocal EME opponents really don't care 
> that much about PwD's needs on the web, but rather simply see that 
> this is but an easy and simple means of casting doubt and confusion 
> around EME, because they don't like the politics of it. We then see 
> others echo "accessibility concerns" without specifics in their 
> responses as a reason to not advance the EME API Spec at the W3C.
>
> That angers me to no end!   It trivializes and politicizes the real 
> issues and problems PwD experience on the web today - without once 
> providing evidence that EME has a negative impact on those people. It 
> "sounds" bad, ergo it must be bad.
>
> Bull feathers!!!
> </rant>
>
> JF
>
> On Wed, Apr 5, 2017 at 8:22 AM, Alastair Campbell 
> <acampbell@nomensa.com <mailto:acampbell@nomensa.com>> wrote:
>
>     Hi everyone,
>
>     I’m trying to get some information to make a choice without
>     getting into a bun-fight on a contentious topic. I’d like to get
>     to the facts of the situation without talking about the good/bad
>     of EME in general, so please bare that in mind.
>
>     *Background:*
>
>     The W3C has “Encrypted Media Extensions” [1] at Proposed
>     Recommendation stage, the spec that defines the API from the
>     browser to a DRM module. Several W3C members are objecting to it
>     on the grounds of the impact is has on security and accessibility.
>
>     *Questions:*
>
>     What I’d like to focus on is the theoretical and practical
>     implications for accessibility. For example, from my reading:
>
>     -*Captions.
>     *If captions are available they must be un-encrypted, so there
>     shouldn’t be an issue there.
>
>     -*Audio description.*
>     I assume audio-description would simply be a separate audio stream
>     or separate video, I don’t see an issue there.
>
>     Other items raised by people to do with accessibility are as
>     follows, with my own comments under the item:
>
>     -*Enlargement of content.*
>     I’m not sure how this is affected. The video is encrypted, but I
>     believe that its size can be adjusted within a page. Captions and
>     the timings that drive them are not encrypted so should not be
>     affected by EME.
>
>     -*Auto captioning of the audio stream.
>     *So encrypting the video & audio would (theoretically at least)
>     prevent a 3^rd party from running auto-captioning software on the
>     audio.
>
>     However, the companies with the capability to do that (Youtube,
>     Microsoft, Amazon etc) are very closely correlated with the
>     companies applying the DRM. Would this be an issue in practice?
>     Presumably the responsibility for providing captions is on the
>     provider who has the non-encrypted copy, therefore they are not
>     prevented from auto-captioning? *
>
>     *
>
>     -*Facial recognition.
>     *I’m not entirely sure what the purpose of this would be,
>     identifying people/actors/characters as they come and go? I can
>     tell Amazon already has that information as meta-data for their
>     videos as the interface can show you who is in the scene. I
>     suspect they add that with a more manual process though, as it
>     doesn’t match whether the face is on screen or not, just whether
>     they are in the scene.
>     Theoretically this would prevent 3^rd party access to facial
>     recognition, but is it something that would be the responsibility
>     of the provider anyway? Not sure.
>
>     -*Color filtering.
>     *On iOS (at least) colour filtering can be done at the hardware
>     level, and if you have colour issues then presumably you’d want it
>     on all the time, not than just videos?
>
>     Given that EME has been implemented in browsers for several years,
>     the question is whether the W3C blesses the spec, and I’d like
>     some solid information on the accessibility aspects before commenting.
>
>     Kind regards,
>
>     -Alastair
>
>     1] https://www.w3.org/TR/2017/PR-encrypted-media-20170316/
>     <https://www.w3.org/TR/2017/PR-encrypted-media-20170316/>
>
>
>
>
> -- 
> John Foliot
> Principal Accessibility Strategist
> Deque Systems Inc.
> john.foliot@deque.com <mailto:john.foliot@deque.com>
>
> Advancing the mission of digital accessibility and inclusion
Received on Wednesday, 5 April 2017 16:30:38 UTC