RE: Requirements for external text alternatives for audio/video from Sean Hayes on 2010-03-27 (public-html-a11y@w3.org from March 2010)

From: Sean Hayes <Sean.Hayes@microsoft.com>
Date: Sat, 27 Mar 2010 15:55:31 +0000
To: Markku Hakkinen <mhakkinen@acm.org>
CC: Silvia Pfeiffer <silviapfeiffer1@gmail.com>, Eric Carlson <eric.carlson@apple.com>, Geoff Freed <geoff_freed@wgbh.org>, "HTML Accessibility Task Force" <public-html-a11y@w3.org>, Matt May <mattmay@adobe.com>, Philippe Le Hegaret <plh@w3.org>
Message-ID: <8DEFC0D8B72E054E97DC307774FE4B9119EFEB84@DB3EX14MBXC315.europe.corp.microsoft.c>
Access to a braille display would be a system level feature, so no I don't think it need be precluded.

From: markku.hakkinen@gmail.com [mailto:markku.hakkinen@gmail.com] On Behalf Of Markku Hakkinen
Sent: Saturday, March 27, 2010 12:16 AM
To: Sean Hayes
Cc: Silvia Pfeiffer; Eric Carlson; Geoff Freed; HTML Accessibility Task Force; Matt May; Philippe Le Hegaret
Subject: Re: Requirements for external text alternatives for audio/video

I hope you are you not suggesting then that the captions are also not available through an accessibility API?    (e.g., an AT user wants the captions rendered to a refreshable Braille display...).

A user with low vision may wish to display captions outside of the frame of the video, allowing for very large text which would otherwise obscure the video.  Does what you suggest preclude this?

I've had examples in the past where shared viewing situations have one person watching, one reading large print captions, and one following with refreshable Braille on a single PC.  I wouldn't want to preclude those possibilities by locking up the captions.


On Fri, Mar 26, 2010 at 5:43 PM, Sean Hayes <Sean.Hayes@microsoft.com<mailto:Sean.Hayes@microsoft.com>> wrote:
I don't think any exposure of the caption text should be made to HTML. There is afaik no exposure to the pixels of an img, the samples of audio, or the frames of a video. Captions and subtitles should be, from the perspective of HTML, a black box media essence in the same manner. This will obviate any IP, security or cross domain issues. So for me #5 is the closest starting point, but nothing needs to show up in the HTML DOM, and we don't have to define any new layout format or triggering; since TTML specifies all that already (and could be extended if something is missing).

The normal CSS inheritance rules apply to the parent <video>, or <audio> element. I would suggest that we define that those styles would then inherit down into the TTML to provide the base styles on the regions in TTML, which could then be overridden within TTML as appropriate. The TTML rendering area would be the same size and shape as the area allotted to the video, but in the next z-plane above it.

For SRT (or similar unstyled formats) I would define a standard mapping to into TTML and provide a default TTML style sheet.

All that is needed then from the HTML side is an API to discover the available text tracks, and switch them on and off.

-----Original Message-----
From: Silvia Pfeiffer [mailto:silviapfeiffer1@gmail.com<mailto:silviapfeiffer1@gmail.com>]
Sent: Thursday, March 25, 2010 11:59 PM
To: Sean Hayes
Cc: Eric Carlson; Geoff Freed; HTML Accessibility Task Force; Matt May; Philippe Le Hegaret
Subject: Re: Requirements for external text alternatives for audio/video

Hi Sean, all,

Sorry, Sean, if I have mis-interpreted you.

Your statement now actually leads neatly onto a discussion that I have been meaning to have next. Namely: how should the external text support be implemented into HTML.

There has been a bit of a private discussion on this topic before and also at http://lists.w3.org/Archives/Public/public-html-a11y/2009Nov/0120.html,
but I'd like to build on these discussions and put the ideas that were mentioned forward. I may not have captured all the possible implementation ideas and not all the possible aspects, so don't feel limited by this list.

1. Expose it directly in the DOM on-the-fly This would mean that inside the <track> element (or some other element), we would render the piece(s) of text that is(are) "currently" active together with their styling. This would be <div>-like.

2. Render it in the shadow-DOM on-the-fly Instead of rendering it directly into the DOM, it could be rendered into the shadow DOM (as anonymous DOM elements), thus avoiding the possibility to manipulate the content through JavaScript.

3. Expose it in an iframe-like construct on-the-fly Instead of opening the content into the main document's DOM, a <iframe>-like construction could be made. This addresses cross-site security issues.

4. Expose complete in an iframe-like construct The complete content of the external text track could be parsed into HTML and exposed in an iframe-like construct. There will need to be some introduction of timing elements.

5. Instead of mapping to HTML, introduce a new layout format Similar to other elements like SVG, we could introduce a new layout format that happens to share some of the layout code with HTML and would also work <iframe>-like. This format would also contain the full content of the external text file, not just the active part, but display activation will be triggered by the video's timeline.

6. Instead of exposing in DOM, provide an attribute on <track> Developers will need to be able to overrun provided styling and placement in the external text file (if only to give video and everything around it a corporate look). If we do not allow this through CSS and the DOM, we could provide a property on <track> which provides a standard XML form with the caption data, and to post an event when it is time to display a caption. This will give a developer everything they need to display captions in sync with the movie, *and* it allows us to deal with the security violation when a script loaded from one origin tries to access internal captions in a movie loaded from another origin (throw an exception?). Note that this is only an issue with internal captions, external captions can already be loaded with XHR so we don't need to impose this restrictions on them.

Ultimately, I think this list won't have sufficient expertise to discuss this topic, so I would like to take the topic of implementation to the larger HTML5 WG. But since there are several people with experience here, maybe we can get some initial ideas and opinions, so the larger discussion can be more focused.

Fire away with your knowledge / opinions!

Cheers,
Silvia.


On Fri, Mar 26, 2010 at 9:23 AM, Sean Hayes <Sean.Hayes@microsoft.com<mailto:Sean.Hayes@microsoft.com>> wrote:
> OK, to be clear I wasn't saying that mapping to CSS is the way things should be done, but only that it is one implementation option. I personally think that the text overlay should be considered as outside of the HTML space in exactly the same way as the video and audio streams are. Captions are a media essence, and have IP rights associated with them. If we integrate the display model into the HTML one, this potentially exposes the caption text to the viewer, and this approach won't work with a protected media file.
>
> Implementing TTML using a private HTML/CSS stack would be fine, but is as I say, just one implementation option.
>
>
> -----Original Message-----
> From: public-html-a11y-request@w3.org<mailto:public-html-a11y-request@w3.org>
> [mailto:public-html-a11y-request@w3.org<mailto:public-html-a11y-request@w3.org>] On Behalf Of Silvia Pfeiffer
> Sent: Thursday, March 25, 2010 8:50 PM
> To: Eric Carlson
> Cc: Geoff Freed; HTML Accessibility Task Force; Matt May; Philippe Le
> Hegaret
> Subject: Re: Requirements for external text alternatives for
> audio/video
>
> Hi Eric,
>
> On Fri, Mar 26, 2010 at 2:46 AM, Eric Carlson <eric.carlson@apple.com<mailto:eric.carlson@apple.com>> wrote:
>>
>> On Mar 24, 2010, at 9:29 PM, Silvia Pfeiffer wrote:
>>
>> In summary - I would suggest keeping the File Format requirement at
>> http://www.w3.org/WAI/PF/HTML/wiki/Media_TextAssociations#File_Format
>> s with supporting both, srt and dfxp (or ttml as Sean clarified).
>>
>>   What DFXP profile are you suggesting we mandate?
>>   As Maciej noted [1], even the presentation profile requires XSL-FO.
>> Does anyone actually think it is reasonable to require a UA to
>> implement this substantial spec just to style captions?
>> eric
>> [1] -
>> http://lists.w3.org/Archives/Public/public-html-a11y/2010Mar/0103.htm
>> l
>
> I believe right now all we need to mandate is the required part of the
> minimum profile
> (http://www.w3.org/TR/ttaf1-dfxp/#profile-dfxp-presentation) - it would conform with WCAG and be extensible to the other features that will certainly be mandated in the future. It looks to me that if other profiles are necessary beyond the ones already given in the TTML specification, these can be developed at a later stage.
>
> Right now we need to take care to find a way to deal with the style and layout specifications. I agree with Sean that this should be done not by implementing the TTML specifications directly, but by mapping them to existing HTML/CSS/JavaScript constructs.
>
>
> Philippe's demos is at http://www.w3.org/2009/02/ThisIsCoffee.html
> with the original TTML file at
> http://www.w3.org/2009/02/ThisIsCoffee61_captions.xml and the JavaScript that interprets it at http://www.w3.org/2008/12/dfxp-testsuite/web-framework/HTML5_player.js.
> The test suite is at
> http://www.w3.org/2008/12/dfxp-testsuite/web-framework/START.html
> which demonstrates support (or lack of support) for each TTML feature
> - choose the HTML5 player to see what mappings are already supported.
>
> These mappings that are currently done in JavaScript have to be extracted into a specification document. And we need to make sure when we implement support for captions that we can add the features parsed out of TTML into the HTML document.
>
> Cheers,
> Silvia.
>
>
>
Received on Saturday, 27 March 2010 15:56:20 UTC