Re: Requirements for external text alternatives for audio/video

Hi Sean,

comments inline.

On Sat, Apr 3, 2010 at 10:06 PM, Sean Hayes <Sean.Hayes@microsoft.com> wrote:
> I'm not really drawing a line in the sand, my main concern here is that while these are all interesting ideas, and could indeed have accessibility benefits; they will require a lot of thought to get them right, which will slow us down. The group is already under considerable pressure to get results to the WG and I'd like to see us just get the stuff we know we need to do done first, before we embark on any grand experiments.

SP:
I'm trying to do the opposite, actually - I am trying to move us
forward without inhibiting our future possibilities. Note that right
now I am not proposing to actually introduce hyperlinks. I am just
trying to keep our minds open for such uses of timed text, since it is
an explicit goal of the W3C video on the Web activity and makes sense
in the Web context.


> You seem to have a mental model of how hyperlinks in captions would work  "If you click on such a link, the media resource would be paused together with its dependent tracks (including the captions). As you return to the media resource, it is unpaused and you continue to experience it ", but that's not how hyperlinks work in HTML where navigation is stateless, so it's not clear to me that would be the navigation model. Why wouldn't they in fact navigate the host page? Or the video source, or to another point on the timeline of the video resource without pausing, or the caption source? If they navigate to a timed text resource and the video is paused, then what provides the time-base for the resource you navigate to; if they don't, what kind of resource is it in fact that they navigate to?

SP:
Navigation in HTML is not stateless. Every browser maintains a history
buffer. Thus, while the server maintains no state, the client very
much does.

Any of the navigations that you are listing are indeed possible - they
are just additions to a history buffer. I was trying to say that you
would be able to get back to the previous state as though it was
paused.

Also, if you navigate to a timed text resource, you are in fact
looking at text - it is no longer a media resource. If it's TTML, it's
an xml file - no different to e.g. a mediaRSS feed, which is just
regarded as some form of "text" in the browser.


> Whatever the model, it would require specification and trial implementations before we get it right. I'm not saying it couldn't be made to work, just that it's all new stuff that would take time to be specified and built out, and I don't want to hold basic captions to ransom till we figure it out, possibly at the expense of missing the HTML5 boat altogether.

SP:
I've got more than 10 years of experience with exactly this kind of
implementation. The Annodex project at the CSIRO was investigating
exactly such hyperlinked media resources (see our short contribution
to the W3C Video in the Web Workshop at
http://www.w3.org/2007/08/video/positions/annodex.pdf). We've thought
about a lot of the implications and done trial implementations of a
lot more than the small proposal I was making here. Annodex wasn't
based on DFXP/TTML, but on CMML (continuous media markup language),
which is another XML-based time-aligned text markup language - the
principles of that experience still apply today and are one of the
reasons I am an invited expert in the W3C - to share and contribute
from our experiences.

Also, again note that I am not proposing to introduce the whole
framework now - you are quite right that it requires a lot more than
just hyperlinks in DFXP/TTML. Just like you, I am not trying to hold
up basic captions. But I am trying to ascertain we don't make
short-sighted decisions that will come back to haunt us later.


> The functionality you are talking about could already in fact be built into the hosting HTML webpage through script, and the proposed media API without having to involve captions so it's not like you won't be able to achieve these things independently.

SP:
All external caption format functionality could be build into the
hosting HTML webpage through JavaScript and indeed has. That doesn't
mean it's the best way - in fact, all of the implementations I have
seen are non-interoperable.

Further, if I was indeed to replicate the hyperlinking functionality
in HTML through JavaScript, I would also require a format such as
DFXP/TTML extended with hyperlinks. It's why we developed CMML 10
years ago which had this kind of hyperlinking functionality.

But let's not get in the details of that. It's not productive to our
current discussion.


> TTML was designed to fit as a timed text resource into a wider Web context, such as SMIL or HTML+TIME which are already endowed with such semantics.  Not everything on the web has to be intrinsically interactive, PNG for example. You can make it interactive within a context, for example HTML image maps. This was the philosophy behind the decision to leave linking out of TTML. TTML is primarily designed to be slaved to an external clock source. When the audio and video is made interactive, for which SMIL is probably a better starting point than TTML, then TTML would fit into that world.

SP:
I have no issue with the current state of TTML. It obviously served a
certain purpose and has its history. But just like image maps are
required to provide interactivity for images, some form of
time-aligned text will be necessary to provide outgoing hyperlinks to
media resources. Introducing hyperlinks, incidentally, does in no way
shape or form change the synchronisation requirement to the external
clock. It is a good thing that TTML is built to be slaved to an
external clock source and it is necessary for it to stay that way.

If, however, you are saying that TTML should not be extended with a
hyperlink functionality and not be used for anything but captions and
subtitles, I would find that a serious restriction, serious enough to
recommend stepping away from TTML. I don't, however, assume that you
are implying that - just that we should not right now concern
ourselves with introducing hyperlinks into TTML, which is fair enough.

I am not trying to change TTML at this stage nor am I trying to
introduce a hyperlinking requirement on captions. I am only concerned
with the decision we have to make on how to actually render the timed
text into a Web page with a video or audio element.

Thus far, we have come across the following options:

1. Expose text intervals directly in the DOM on-the-fly

2. Render text intervals in the shadow-DOM on-the-fly

3. Expose text intervals in an iframe-like construct on-the-fly

4. Expose complete time-aligned text file content in an iframe-like construct

5. Instead of mapping to HTML, introduce a new layout format

6. Instead of exposing in DOM, provide an attribute on <track> that
contains the complete time-aligned text file content


Of these, I think it would be short-sighted if we chose an option that
would disallow interaction with the text such as through a hyperlink,
which I believe might be the case if we chose the shadow-DOM (option
2), even if it was extended with an attribute on the <track> element
(option 6). But I may be mistaken and maybe one of the browser vendors
can explain more what restrictions the shadow DOM would introduce and
whether it would be a problem for e.g. interactivity.


Best Regards,
Silvia.

Received on Saturday, 3 April 2010 12:34:12 UTC