Re: Requirements for external text alternatives for audio/video

Hi Sean,

I have hesitated to reply to this email simply because you are drawing
a line in the sand for what is accessibility for media and what isn't
that I don't quite follow. The line between accessibility, usability
and interactivity for Web media resources is IMO not as clear-cut as
you make it to be.

It is true: some media accessibility topics are clear and "well
understood" as you say: captions, audio descriptions and transcripts.
These have originated from traditional audio-visual systems, in
particular TV, where there are no hyperlinks. Does this mean that
hyperlinks should be excluded from captions when we take captions to
the Web? Does this mean that DAISY-style interactions should not be
allowed for media resources on the Web, because we are modeling our
view of media on the Web on traditional media?

When we took text documents from the desktop to the Web, the one thing
that made a difference and that defined Web documents were hyperlinks.
We have even seen the reverse happen: Word documents which are
traditional desktop text documents now include hyperlinks. I strongly
believe that in future we will see hyperlinks in captions and
subtitles on interactive TV. Why should we not start with such a
"revolutionary" concept on the Web?

Further, I would be very disappointed if a "timed text markup
language" is nothing but a "caption format". A "markup language" has
traditionally stood for marking up hyperlinked resources. I honestly
cannot explain to my fellow Web developers why the W3C would develop a
TTML without hyperlinks. They go "but it's the Web!...?" and "but it's
the W3C!...?" and I have no answers other than stating that DFXP
hasn't really been developed for a Web context. But I believe we can
fix this.

It is true - traditional captions and subtitles don't have hyperlinks.
Those can continue to be used in this case. But why not also introduce
"modern" captions - captions that do have hyperlinking functionality.
Are you concerned that those people that use captions will get more
functionality from the captions that people who do not turn on the
captions? Are you concerned that there will be useful relationships
represented in captions that people that do not use captions will not
receive? I say: it's a good thing! For once, give the HoH an advantage
of those that aren't. And let them decide what good quality captions
are - they can always turn such captions off that provide hyperlinks
that are not acceptable, or choose an alternative without hyperlinks.

But let me address your objections:

* "audio is not interactive"

To this I would say: "not yet". As we introduce cue ranges or similar
concepts, we will be able to introduce interactivity into audio and
video resources. This introduction is also absolutely required: we are
on the Web here and not on TV. Hyperlinking and interactivity are core
to the Web. It is part of the whole "W3C Video on the Web" activity
that we are part of, see http://www.w3.org/2007/08/video/ . Thus, with
the introduction of interactivity to audio and video resources, there
may well be a need to introduce a type of timed text track that
provides alternatives for that interactivity - and we know from
transcripts that hyperlinks are really important to link to text
alternatives.


* "Adding interactivity to captions would break the semantic idea that
they match the audio."

I do not see what the introduction of hyperlinks into captions has to
do with breaking the "matching to the audio". Hyperlinks are URLs that
are placed behind sections of text. That this text in this case are
captions, i.e. a segment of text that is a (mostly) literal transcript
of the audio in the video (or audio) does not destroy the "matching to
the audio". If you click on such a link, the media resource would be
paused together with its dependent tracks (including the captions). As
you return to the media resource, it is unpaused and you continue to
experience it. There is no breakage.


* "Adding interactivity to captions <..> could end up being badly
abused and confusing for the user."

This is not something that can be controlled - ever. Abuse has
happened on the Web since its beginning - hyperlinks can always point
to the wrong content, bad content or confusing content. It's not a
reason to remove hyperlinks from the Web and it shouldn't be an
argument to stop their introduction into timed text.


* "Adding interactivity to captions <..> could end up introducing
unnecessary security and social engineering issues."

There is indeed a problem that we have to solve with security issues
if we allow text from a third party server to be interpreted and
displayed in a Web page from a different server. But it has nothing to
do with introducing hyperlinks - there are no new security issues
created through having hyperlinks in captions.


* "Captions should be as near as possible the exact equivalent of the
audio, with adequate typography to be easily readable."

That can be achieved also while hyperlinks can be achieved. They do
not contradict each other.


Note that I would be open to introducing a different type of timed
text track that is more interactive along the lines that you outline
the AVTEF to be. I believe it does not require javascript injection,
but that is certainly something to discuss.

The key points I wanted to make that strongly relate to this
discussion though are:

* introducing hyperlinks in a timed text format is not difficult, but
very powerful, and I do not see a reason why caption and subtitle
files should be excluded from such functionality.

* whichever powerful timed text format we propose should allow for hyperlinks.

* this debate is important for the decision on how to implement
captions & subtitles - if we chose an implementation that will not
allow us to expose, e.g. hyperlinks, then it will be restricting what
we can do in the future when we want more powerful timed text. While
right now with captions and subtitles - in particular in SRT - there
is no need for anything fancy and we can just hide it all in a shadow
DOM, this may prove to be a short sighted decision in the future. I am
just pointing out the bigger picture that we need to concern ourselves
with.

Thus, I see direct and indirect relationships to accessibility issues
in the hyperlinks discussion. It will not derail the whole concept of
captions and subtitles if we don't let it. But it will help us make
better decisions.

Best Regards,
Silvia.


On Tue, Mar 30, 2010 at 4:39 PM, Sean Hayes <Sean.Hayes@microsoft.com> wrote:
> I'm fully aware of what can be done with an interactive media system, I've worked on dozens of them over the last 20 odd years; what I'm saying is that you are trying to insert functionality here that the <video> and <audio> tag were not scoped for in HTML5, and doing so under the guise of accessibility seems to me somewhat contrived.
>
> There are a few well understood modes of accessibility for media which we need to address with high priority: captions, description, and transcript. Captions are a time based text equivalent to audio, audio is not interactive; and neither should the captions be. Adding interactivity to captions would break the semantic idea that they match the audio, and could end up being badly abused and confusing for the user, as well as introducing unnecessary security and social engineering issues. Captions should be as near as possible the exact equivalent of the audio, with adequate typography to be easily readable. Captions also belong to the media, and so if any branding is to be supplied then it should matched to the video content, not to the player, and it would be up to the content owner to supply the styling. Such branding should not be at the expense of readability. A similar argument would apply to subtitles.  The caption text needs to be available to assistive technology, but that does imply that the HTML author needs to get involved to make that happen.
>
> Now if you want to introduce interactive media into HTML5, without invoking the full SMIL model, then you could certainly define another kind of timed track, perhaps along the lines of ATVEF, which creates javascript events, and carries a payload which could be injected into the HTML DOM; this is quite powerful enough to do all the things you list and more, and I'd be happy to contribute to a debate on the pros and cons of such a model vs SMIL. However that debate should not be part of an accessibility discussion, and if we have it here I think there is a very real danger of derailing the whole concept of caption and subtitle support in HTML5.
>
> -----Original Message-----
> From: Silvia Pfeiffer [mailto:silviapfeiffer1@gmail.com]
> Sent: Monday, March 29, 2010 12:19 AM
> To: Sean Hayes
> Cc: Laura Carlson; Eric Carlson; Geoff Freed; HTML Accessibility Task Force; Matt May; Philippe Le Hegaret
> Subject: Re: Requirements for external text alternatives for audio/video
>
> On Mon, Mar 29, 2010 at 7:38 AM, Sean Hayes <Sean.Hayes@microsoft.com> wrote:
>> I don't disagree with the need to provide appropriate alternatives to media, but the mechanism of providing a transcript is perhaps not best provided through the mechanism of trapping captions. As you say, captions would in fact probably not be an adequate replacement for the media without the text of description being included at minimum. Thus a transcript is more like the alt text on an image, a different semantic beast than captions, and probably better provided by other means.
>>
>> I think there is an important larger issue here. Is the text mechanism intended to provide captions and subtitles; or is the intention, as Silvia's examples would seem to suggest, to use it turn HTML5 into a time based media like SMIL or HTML+TIME.   If the latter, and this mechanism is intended to address corporate branding and advertising, then I think we are straying out of the remit of accessibility into something much larger which would need to be taken up in the wider group.
>
> The two examples that you are providing are two extremes:
> captions/subtitles on the one end, and SMIL/HTML+Time on the other.
> Right now and for the purposes of this group we are focused on captions/subtitles. But already with the features of DFXP there is a possibility to go a step further, without going all the way to the complexity of SMIL/HTML+Time - which, IMO, needs to come in at a different level.
>
> What I was describing is simply time-aligned text that is a bit more capable than just being plain text. In particular I am talking about hyperlinks, which are essentially nothing more than styled text, but provide Web functionality - something that should be very important to us in the given context. This has nothing to do with going all the way to SMIL/HTML+Time. It is still no more than captions or subtitles, but with the possibility of linking out at a given time.
>
> Think about it: we could have captions that allow us to explain things further - e.g. a movie about a historic event with names of people mentioned and you could click through on the names of the people and find out what they were really like and why they are portrayed as they are in the movie. Directly related "supplementary material" - not banned to another resource as it currently is in DVDs. Actually available at your fingertip when you are interested in it.
>
> Or we could have captions of a political discussion with links to explain some background on the speakers.
>
> Or we could have captions that would link to a dictionary entry for words that are used very infrequently in a language.
>
> Or, of course, we could have links in ads to the eCommerce site of the current ad, so we can directly go and purchase the product.
>
> This is not difficult to do on top of what we have right now, but requires the ability to at least interact with links inside timed text.
>
> Note that I am not even sure if current DFXP/TTML supports hyperlinks, but if it doesn't I would be very keen on introducing them because they are extremely useful. Since DFXP/TTML is declared as being easily extensible, that should not be so hard to do.
>
> Regards,
> Silvia.
>
>

Received on Saturday, 3 April 2010 09:14:14 UTC