Re: Requirements for external text alternatives for audio/video from Silvia Pfeiffer on 2010-03-27 (public-html-a11y@w3.org from March 2010)

From: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
Date: Sun, 28 Mar 2010 09:50:32 +1100
To: Sean Hayes <Sean.Hayes@microsoft.com>
Cc: Eric Carlson <eric.carlson@apple.com>, Geoff Freed <geoff_freed@wgbh.org>, HTML Accessibility Task Force <public-html-a11y@w3.org>, Matt May <mattmay@adobe.com>, Philippe Le Hegaret <plh@w3.org>
Message-ID: <2c0e02831003271550l31a7d39an1affd3b7fb55831@mail.gmail.com>
I think there is actually a need to expose the text of the captions
somewhat more.

What if a corporate would like to put everything on a Website in their
own styling - including the video player and the captions. Something
like the animated lyrics demo displayed here:
http://svg-wow.org/audio/animated-lyrics.html . There would definitely
need to be access to the text - either inline or through JavaScript.

Also, what about displaying a synchronised text transcript that is
displayed in its full text on the side of the video with captions
being displayed on top of the video? If I click on the caption, I
would want the text to scroll to the right offset and continue
displaying the highlighted bits from there. In fact, what user
interactions do we allow with captions?

Will there be the possibility for URLs inside captions from which
people will be able to do things? I believe this is a very important
use case. Important for linking to related content - and ultimately
important for business models around advertising overlayed on video.

It makes sense to hide pixels and audio samples (though that is being
increasingly questioned, too), because they are binary data that is
not as much what the Web is built around. But hiding text from the Web
developer seems not the right thing to do on the Web.

The Web and its functionality is all about having text available on
the page to be able to do things with. If we don't expose it, people
will simply go back to the original xml file and ignore all APIs that
we develop inside the browser. They will parse out text from there and
do things with it in their own JavaScript libraries. Is that really
what we want?

Cheers,
Silvia.


On Sat, Mar 27, 2010 at 9:13 AM, Sean Hayes <Sean.Hayes@microsoft.com> wrote:
> I don't think any exposure of the caption text should be made to HTML. There is afaik no exposure to the pixels of an img, the samples of audio, or the frames of a video. Captions and subtitles should be, from the perspective of HTML, a black box media essence in the same manner. This will obviate any IP, security or cross domain issues. So for me #5 is the closest starting point, but nothing needs to show up in the HTML DOM, and we don't have to define any new layout format or triggering; since TTML specifies all that already (and could be extended if something is missing).
>
> The normal CSS inheritance rules apply to the parent <video>, or <audio> element. I would suggest that we define that those styles would then inherit down into the TTML to provide the base styles on the regions in TTML, which could then be overridden within TTML as appropriate. The TTML rendering area would be the same size and shape as the area allotted to the video, but in the next z-plane above it.
>
> For SRT (or similar unstyled formats) I would define a standard mapping to into TTML and provide a default TTML style sheet.
>
> All that is needed then from the HTML side is an API to discover the available text tracks, and switch them on and off.
>
> -----Original Message-----
> From: Silvia Pfeiffer [mailto:silviapfeiffer1@gmail.com]
> Sent: Thursday, March 25, 2010 11:59 PM
> To: Sean Hayes
> Cc: Eric Carlson; Geoff Freed; HTML Accessibility Task Force; Matt May; Philippe Le Hegaret
> Subject: Re: Requirements for external text alternatives for audio/video
>
> Hi Sean, all,
>
> Sorry, Sean, if I have mis-interpreted you.
>
> Your statement now actually leads neatly onto a discussion that I have been meaning to have next. Namely: how should the external text support be implemented into HTML.
>
> There has been a bit of a private discussion on this topic before and also at http://lists.w3.org/Archives/Public/public-html-a11y/2009Nov/0120.html,
> but I'd like to build on these discussions and put the ideas that were mentioned forward. I may not have captured all the possible implementation ideas and not all the possible aspects, so don't feel limited by this list.
>
> 1. Expose it directly in the DOM on-the-fly This would mean that inside the <track> element (or some other element), we would render the piece(s) of text that is(are) "currently" active together with their styling. This would be <div>-like.
>
> 2. Render it in the shadow-DOM on-the-fly Instead of rendering it directly into the DOM, it could be rendered into the shadow DOM (as anonymous DOM elements), thus avoiding the possibility to manipulate the content through JavaScript.
>
> 3. Expose it in an iframe-like construct on-the-fly Instead of opening the content into the main document's DOM, a <iframe>-like construction could be made. This addresses cross-site security issues.
>
> 4. Expose complete in an iframe-like construct The complete content of the external text track could be parsed into HTML and exposed in an iframe-like construct. There will need to be some introduction of timing elements.
>
> 5. Instead of mapping to HTML, introduce a new layout format Similar to other elements like SVG, we could introduce a new layout format that happens to share some of the layout code with HTML and would also work <iframe>-like. This format would also contain the full content of the external text file, not just the active part, but display activation will be triggered by the video's timeline.
>
> 6. Instead of exposing in DOM, provide an attribute on <track> Developers will need to be able to overrun provided styling and placement in the external text file (if only to give video and everything around it a corporate look). If we do not allow this through CSS and the DOM, we could provide a property on <track> which provides a standard XML form with the caption data, and to post an event when it is time to display a caption. This will give a developer everything they need to display captions in sync with the movie, *and* it allows us to deal with the security violation when a script loaded from one origin tries to access internal captions in a movie loaded from another origin (throw an exception?). Note that this is only an issue with internal captions, external captions can already be loaded with XHR so we don't need to impose this restrictions on them.
>
> Ultimately, I think this list won't have sufficient expertise to discuss this topic, so I would like to take the topic of implementation to the larger HTML5 WG. But since there are several people with experience here, maybe we can get some initial ideas and opinions, so the larger discussion can be more focused.
>
> Fire away with your knowledge / opinions!
>
> Cheers,
> Silvia.
>
>
> On Fri, Mar 26, 2010 at 9:23 AM, Sean Hayes <Sean.Hayes@microsoft.com> wrote:
>> OK, to be clear I wasn't saying that mapping to CSS is the way things should be done, but only that it is one implementation option. I personally think that the text overlay should be considered as outside of the HTML space in exactly the same way as the video and audio streams are. Captions are a media essence, and have IP rights associated with them. If we integrate the display model into the HTML one, this potentially exposes the caption text to the viewer, and this approach won't work with a protected media file.
>>
>> Implementing TTML using a private HTML/CSS stack would be fine, but is as I say, just one implementation option.
>>
>>
>> -----Original Message-----
>> From: public-html-a11y-request@w3.org
>> [mailto:public-html-a11y-request@w3.org] On Behalf Of Silvia Pfeiffer
>> Sent: Thursday, March 25, 2010 8:50 PM
>> To: Eric Carlson
>> Cc: Geoff Freed; HTML Accessibility Task Force; Matt May; Philippe Le
>> Hegaret
>> Subject: Re: Requirements for external text alternatives for
>> audio/video
>>
>> Hi Eric,
>>
>> On Fri, Mar 26, 2010 at 2:46 AM, Eric Carlson <eric.carlson@apple.com> wrote:
>>>
>>> On Mar 24, 2010, at 9:29 PM, Silvia Pfeiffer wrote:
>>>
>>> In summary - I would suggest keeping the File Format requirement at
>>> http://www.w3.org/WAI/PF/HTML/wiki/Media_TextAssociations#File_Format
>>> s with supporting both, srt and dfxp (or ttml as Sean clarified).
>>>
>>>   What DFXP profile are you suggesting we mandate?
>>>   As Maciej noted [1], even the presentation profile requires XSL-FO.
>>> Does anyone actually think it is reasonable to require a UA to
>>> implement this substantial spec just to style captions?
>>> eric
>>> [1] -
>>> http://lists.w3.org/Archives/Public/public-html-a11y/2010Mar/0103.htm
>>> l
>>
>> I believe right now all we need to mandate is the required part of the
>> minimum profile
>> (http://www.w3.org/TR/ttaf1-dfxp/#profile-dfxp-presentation) - it would conform with WCAG and be extensible to the other features that will certainly be mandated in the future. It looks to me that if other profiles are necessary beyond the ones already given in the TTML specification, these can be developed at a later stage.
>>
>> Right now we need to take care to find a way to deal with the style and layout specifications. I agree with Sean that this should be done not by implementing the TTML specifications directly, but by mapping them to existing HTML/CSS/JavaScript constructs.
>>
>>
>> Philippe's demos is at http://www.w3.org/2009/02/ThisIsCoffee.html
>> with the original TTML file at
>> http://www.w3.org/2009/02/ThisIsCoffee61_captions.xml and the JavaScript that interprets it at http://www.w3.org/2008/12/dfxp-testsuite/web-framework/HTML5_player.js.
>> The test suite is at
>> http://www.w3.org/2008/12/dfxp-testsuite/web-framework/START.html
>> which demonstrates support (or lack of support) for each TTML feature
>> - choose the HTML5 player to see what mappings are already supported.
>>
>> These mappings that are currently done in JavaScript have to be extracted into a specification document. And we need to make sure when we implement support for captions that we can add the features parsed out of TTML into the HTML document.
>>
>> Cheers,
>> Silvia.
>>
>>
>>
>
>
Received on Saturday, 27 March 2010 22:51:26 UTC