RE: Requirements for external text alternatives for audio/video from Sean Hayes on 2010-04-03 (public-html-a11y@w3.org from April 2010)

From: Sean Hayes <Sean.Hayes@microsoft.com>
Date: Sat, 3 Apr 2010 14:02:06 +0000
To: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
CC: Laura Carlson <laura.lee.carlson@gmail.com>, Eric Carlson <eric.carlson@apple.com>, Geoff Freed <geoff_freed@wgbh.org>, "HTML Accessibility Task Force" <public-html-a11y@w3.org>, Matt May <mattmay@adobe.com>, Philippe Le Hegaret <plh@w3.org>
Message-ID: <8DEFC0D8B72E054E97DC307774FE4B911A47F37F@DB3EX14MBXC301.europe.corp.microsoft.c>
Well I'm glad you are not intending to add hyperlinks now, that wasn't very clear from the discussion this far. At an appropriate time I'd love to be engaged in the discussion of the best way to go about that, either extending TTML in some future incarnation of the TTWG or another forum.

I don't think introducing TTML/SRT as a closed system and not exposing it to HTML in any way closes options for the future. And what's more it doesn't leave a legacy we might have to back out of. Right now I'm not seeing any need to expose the text outside of the TTML/SRT rendering engine into the real DOM.  If by "Shadow DOM" (option 2) you mean making it available to AT, then I'm OK with that.

-----Original Message-----
From: Silvia Pfeiffer [mailto:silviapfeiffer1@gmail.com] 
Sent: Saturday, April 03, 2010 1:33 PM
To: Sean Hayes
Cc: Laura Carlson; Eric Carlson; Geoff Freed; HTML Accessibility Task Force; Matt May; Philippe Le Hegaret
Subject: Re: Requirements for external text alternatives for audio/video

Hi Sean,

comments inline.

On Sat, Apr 3, 2010 at 10:06 PM, Sean Hayes <Sean.Hayes@microsoft.com> wrote:
> I'm not really drawing a line in the sand, my main concern here is that while these are all interesting ideas, and could indeed have accessibility benefits; they will require a lot of thought to get them right, which will slow us down. The group is already under considerable pressure to get results to the WG and I'd like to see us just get the stuff we know we need to do done first, before we embark on any grand experiments.

SP:
I'm trying to do the opposite, actually - I am trying to move us forward without inhibiting our future possibilities. Note that right now I am not proposing to actually introduce hyperlinks. I am just trying to keep our minds open for such uses of timed text, since it is an explicit goal of the W3C video on the Web activity and makes sense in the Web context.


> You seem to have a mental model of how hyperlinks in captions would work  "If you click on such a link, the media resource would be paused together with its dependent tracks (including the captions). As you return to the media resource, it is unpaused and you continue to experience it ", but that's not how hyperlinks work in HTML where navigation is stateless, so it's not clear to me that would be the navigation model. Why wouldn't they in fact navigate the host page? Or the video source, or to another point on the timeline of the video resource without pausing, or the caption source? If they navigate to a timed text resource and the video is paused, then what provides the time-base for the resource you navigate to; if they don't, what kind of resource is it in fact that they navigate to?

SP:
Navigation in HTML is not stateless. Every browser maintains a history buffer. Thus, while the server maintains no state, the client very much does.

Any of the navigations that you are listing are indeed possible - they are just additions to a history buffer. I was trying to say that you would be able to get back to the previous state as though it was paused.

Also, if you navigate to a timed text resource, you are in fact looking at text - it is no longer a media resource. If it's TTML, it's an xml file - no different to e.g. a mediaRSS feed, which is just regarded as some form of "text" in the browser.


> Whatever the model, it would require specification and trial implementations before we get it right. I'm not saying it couldn't be made to work, just that it's all new stuff that would take time to be specified and built out, and I don't want to hold basic captions to ransom till we figure it out, possibly at the expense of missing the HTML5 boat altogether.

SP:
I've got more than 10 years of experience with exactly this kind of implementation. The Annodex project at the CSIRO was investigating exactly such hyperlinked media resources (see our short contribution to the W3C Video in the Web Workshop at http://www.w3.org/2007/08/video/positions/annodex.pdf). We've thought about a lot of the implications and done trial implementations of a lot more than the small proposal I was making here. Annodex wasn't based on DFXP/TTML, but on CMML (continuous media markup language), which is another XML-based time-aligned text markup language - the principles of that experience still apply today and are one of the reasons I am an invited expert in the W3C - to share and contribute from our experiences.

Also, again note that I am not proposing to introduce the whole framework now - you are quite right that it requires a lot more than just hyperlinks in DFXP/TTML. Just like you, I am not trying to hold up basic captions. But I am trying to ascertain we don't make short-sighted decisions that will come back to haunt us later.


> The functionality you are talking about could already in fact be built into the hosting HTML webpage through script, and the proposed media API without having to involve captions so it's not like you won't be able to achieve these things independently.

SP:
All external caption format functionality could be build into the hosting HTML webpage through JavaScript and indeed has. That doesn't mean it's the best way - in fact, all of the implementations I have seen are non-interoperable.

Further, if I was indeed to replicate the hyperlinking functionality in HTML through JavaScript, I would also require a format such as DFXP/TTML extended with hyperlinks. It's why we developed CMML 10 years ago which had this kind of hyperlinking functionality.

But let's not get in the details of that. It's not productive to our current discussion.


> TTML was designed to fit as a timed text resource into a wider Web context, such as SMIL or HTML+TIME which are already endowed with such semantics.  Not everything on the web has to be intrinsically interactive, PNG for example. You can make it interactive within a context, for example HTML image maps. This was the philosophy behind the decision to leave linking out of TTML. TTML is primarily designed to be slaved to an external clock source. When the audio and video is made interactive, for which SMIL is probably a better starting point than TTML, then TTML would fit into that world.

SP:
I have no issue with the current state of TTML. It obviously served a certain purpose and has its history. But just like image maps are required to provide interactivity for images, some form of time-aligned text will be necessary to provide outgoing hyperlinks to media resources. Introducing hyperlinks, incidentally, does in no way shape or form change the synchronisation requirement to the external clock. It is a good thing that TTML is built to be slaved to an external clock source and it is necessary for it to stay that way.

If, however, you are saying that TTML should not be extended with a hyperlink functionality and not be used for anything but captions and subtitles, I would find that a serious restriction, serious enough to recommend stepping away from TTML. I don't, however, assume that you are implying that - just that we should not right now concern ourselves with introducing hyperlinks into TTML, which is fair enough.

I am not trying to change TTML at this stage nor am I trying to introduce a hyperlinking requirement on captions. I am only concerned with the decision we have to make on how to actually render the timed text into a Web page with a video or audio element.


Thus far, we have come across the following options:

1. Expose text intervals directly in the DOM on-the-fly

2. Render text intervals in the shadow-DOM on-the-fly

3. Expose text intervals in an iframe-like construct on-the-fly

4. Expose complete time-aligned text file content in an iframe-like construct

5. Instead of mapping to HTML, introduce a new layout format

6. Instead of exposing in DOM, provide an attribute on <track> that contains the complete time-aligned text file content


Of these, I think it would be short-sighted if we chose an option that would disallow interaction with the text such as through a hyperlink, which I believe might be the case if we chose the shadow-DOM (option 2), even if it was extended with an attribute on the <track> element (option 6). But I may be mistaken and maybe one of the browser vendors can explain more what restrictions the shadow DOM would introduce and whether it would be a problem for e.g. interactivity.


Best Regards,
Silvia.
Received on Saturday, 3 April 2010 14:02:50 UTC