Re: Change Proposals toward Issue-9: "how accessibility works for <video> is unclear"

Hi Henri,

On Tue, Apr 13, 2010 at 1:01 AM, Henri Sivonen <> wrote:
> "Silvia Pfeiffer" <> wrote:
>> Change Proposal 1:
> I think this API should fire events when the user uses the UA-provided UI (e.g. context menu) to enable or disable a track. This way, author-supplied playback UI can stay in sync when the user uses the UA-supplied UI to change the track state.
> I think the names used by this API should align with the names used by the <track> element. Comments on the attribute names on the <track> element below.

Yes, I totally agree.

>> Change Proposal 2:
>> This proposal introduces declarative markup to associate external
>> timed text resources (such as captions and subtitles) with a media
>> resource. It introduces <track> and <trackgroup> elements to be used
>> inside media elements and provides some recommendations on how to
>> render the text resources.
> The name <track> is an improvement over <text> or <itext>. Thanks. I feel the attribute names need some bikeshedding, though:
>  * src: This is good.
>  * name: Since this is a human-readable UI string, please call it @title. Using @title would be consistent with the way style sheet naming works. (@name is the old name for @id.)

We've had this discussion elsewhere. I still think @name is more
appropriate, since @title is used as default attribute on all
elements. We used @name in order to stay in sync with the Multitrack
API, which exposes a name that is given to the track inside the media
resource and thus cannot use @title.

>  * role: Please find another name for this. @role is taken by ARIA for overriding how an element is exposed to accessibility APIs. This attribute isn't about overriding the native HTML semantic. This attribute is a native HTML semantic, and the value space of this attribute is different from the value space of @role as used by ARIA.

I am not sure about this. @role was chosen on purpose for
accessibility reasons and in this case should be the same as the ARIA
attribute. But if there is a reason to separate them, that's ok.

>  * type: This is good.
>  * media: This is good.
>  * language: Please rename this to hreflang for consistency pre-existing HTML elements that have an attribute that designates the natural language of an external resource.

This was under discussion as well and we ended up with @language since
it is more humanly readable and it would be nicer to be in sync with
the Multitrack API here, too. For the external reference, @hreflang
would make sense, but it doesn't work for the Multitrack API.

> In order to integrate with the security model of the Web, I think captions should be rendered using a nested browsing context whose origin is the URL of the text track file instead of using a <div> like area.

The actual implementation has indeed not been specified yet and it is
good to get ideas like this. With "<div>-like area" we only described
the display itself - it is very well possible to have that inside a
<iframe>-like area such as you are describing.

> I think TTML support shouldn't be required. On the contrary, I think implementing support for it should be actively discouraged in order to avoid making the Web depend on TTML support.
> The text formatting model of the Web is CSS. I think it is counter-productive to introduce a format that is targeted at Web use but reinvents large parts of CSS. Furthermore, CSS formatting operates on a DOM. Since TTML is an XML vocabulary, it maps to a DOM. However, rendering this DOM using a CSS formatter would show all the timed text at the same time, so it would have to be specified how the formatting works over time. TTML isn't defined in terms of CSS Animations that deals with the problem of changing CSS properties as a function of time. Finally, TTML introduces 7 XML namespaces and uses namespaced attributes.

I'd like to keep the discussion of the supported external formats
separate from the mechanism for association.

I know that TTML is very controversial and possibly doesn't satisfy
all the needs. But it seems to have become an industry standard
elsewhere. I personally am also unhappy about it and may get involved
in developing a new format (see discussion on WHATWG). But I don't
think we can completely ignore TTML. There are plenty of things to
resolve for TTML first though so it will be difficult if not
impossible to require TTML support from the start.

>> Should they be included in plain sight in the DOM? Should they be included
>> in a shadow DOM?
> It would be good if the accessibility TF didn't overload the term "shadow DOM" for non-XBL purposes.

A Webkit developer used this term and explained it to me as as being
elements that are in the DOM but not accessible by JavaScript, which
was what that question was asking about. I apologize if this is not
the right term. I hope it is clear now what it means.

>> Should they be rendered into an iframe-like construct?
> I suggest the following:
>  1) Support two captioning formats: plain SRT (the timed strings are plain text) and HTML-extended SRT (like SRT but the timed strings are HTML fragments).

I don't like this idea since SRT is not a HTML-style markup language
and mixing HTML markup with other types of markup doesn't make sense.
I've made examples on the WHATWG mailing list for why I think it's a
bad idea. However, the basic idea of having just a simple markup with
<div @start @end> and then HTML inside it totally makes sense to me.

>  2) Establish a nested browsing context that overlays the video frame exactly.
>  3) Initialize a document into the nested browsing context as if "data:text/html;charset=utf8,<!DOCTYPE html>" had been loaded.
>  4) Set the origin on the document in the nested browsing context to the URL of the time text file.
>  5) Associate document in the nested browsing context with a UA style sheet along the lines of
> html {
>  display: table;
>  height: 100%;
>  font-family: sans-serif;
>  font-size: /* Computed magically from the size of the video frame. */;
>  color: white;
>  background-color: transparent; /* Show the video frame underneath */
>  text-outline: black 0.1em 0.1em; /* Just guessing with 0.1em here. */
> }
> body {
>  display: table-cell;
>  padding: 0.5em, 1em, 0.5em, 1em;
>  vertical-align: bottom;
> }
>  6) When the video playback advances to a time given as the start time of a given timed text string, post a timed text display task on the main thread with that string as its |value|, the document of the nested browsing context as its |targetDocument| and a flag indicating whether the string is plain text or an HTML fragment.
>  7) When the timed text display task fires, make it run the equivalent targetDocument.body.textContent = value; if the flag indicated plain text or targetDocument.body.innerHTML = value; if the flag indicated an HTML fragment.
>  8) When video playback advances to a point where the timed text needs to be cleared from view, post a timed text display task on the main thread with "" as its |value|, the document of the nested browsing context as its |targetDocument| and a flag indicating that the string is plain text.
> As a bonus, allow the author to designate a style sheet that cascades with the UA style sheet defined in point #5 above.

I like the general approach of this.

> The suggestion, by design, doesn't support overlapping display time ranges for two timed text strings. The suggestion could be elaborated on by allowing the timed text container to have a top/bottom alignment flag on a per string basis and making the display task flip body's vertical-align from bottom to top accordingly.

I think it is too serious a restriction to disallow overlapping time
ranges. We may have multiple timed text resources active at the same
time, e.g. subtitles in two different languages. Maybe this requires
multiple such nested browsing contexts?


Received on Tuesday, 20 April 2010 11:44:43 UTC