[whatwg] <video><overlay> for captions/subtitles/etc

Philip, all,

On Sun, Nov 29, 2009 at 9:37 PM, Philip J?genstedt <philipj at opera.com> wrote:
> On Sun, 29 Nov 2009 06:21:45 +0100, Silvia Pfeiffer
> <silviapfeiffer1 at gmail.com> wrote:
>> My <itext> wasn't supposed to stay a JavaScript implementation. In
>> fact, it had the exact same purpose as your <ovelay> proposal: to
>> eventually be added into the HTML5 specification and be properly
>> integrated, such that it didn't have to rely on the timeupdate.
>> In fact, the <itextlist>/<itext> proposal, which was my second
>> improvement, see
>> https://wiki.mozilla.org/Accessibility/HTML5_captions_v2, doesn't look
>> very different to what you have there.
> Yes, that is very clear, I used it only as an example of what needs to be
> done to parse SRT with JavaScript. Go ahead and edit the wiki if there's
> anything that makes it sounds like <itext> is something it is not.

I guess what I was just missing is mention of what your proposal
provides on top of what I had. You're stating that further down in
your email, so it might be good to mention that. It also shows we are
making progress. :-)

>> I think you've taken the next step with proposing to add a wrapping
>> <div> into the DOM - something I wasn't quite sure would be possible
>> and I'm glad you've taken the step.
>> Another comment on naming: whether we name the elements <itextlist>
>> and <itext> or alternatively <overlay> and <source>, I'm not too
>> fussed. In fact, I've discussed the renaming/reuse of <source> for
>> <itext> in my recent blog post at
>> http://blog.gingertech.net/2009/11/25/manifests-exposing-structure-of-a-composite-media-resource/
>> . I think it may well make a lot of sense since we can reduce the key
>> required attributes to the ones that already exist for the <source>
>> element.
> Indeed, my proposal is mainly a remix of <itext> and cue ranges. The main
> selling point, though, is a consistent markup and DOM for in-band, external
> and script-created subtitles and a hook to content into the fullscreen mode.

These are where we are indeed making progress - excellent!

I must admit, I am still a bit dubious about how you are proposing to
deal with in-band captions. Is a UA expected to take them out of the
file and directly render them into <overlay>? Then you don't get the
kind of control you get as a Web author over external captions, e.g.
to specify a media query.

Also, the user doesn't get exposed to the tracks that are available,
so he/she could choose interactively. I have been told that such
interactive choice of the to-be-displayed caption track is a
requirement, since people may use the subtitles/captions to learn a
new language or read in their actual native language. YouTube
certainly exposes all the available alternative language tracks - also
because some of these tracks are actually created on the fly by
automated translation. These are some of the reasons I was asked to
provide declarative markup of all of the available subtitle tracks of
video, no matter whether they came out of the media file (in-line) or

So, maybe we can use <source> to not just point at further external
subtitle tracks, but also at in-band subtitle tracks and thus really
make in-band identical to out-of-band? We could even use Media
Fragment URI addressing for such an approach, e.g.

<source src="captions-english.srt" lang="en"></source>
<source src="video.ogv?track=subtitle[de]" lang="de"></source>

or alternatively if no file was given in the @src attribute of a
<source> element, it would be clear that it pointed a track in the
original media file like so:

<source lang="de"></source>

About the cue ranges:

If I understand your approach, then it means that if the video ends up
playing at a time that is between a registered cue range's start and
end time, the given DOMString text would be added to the <overlay>
element and displayed. Is this correct?

Would it not be better to register onenter and onleave functions that
could do anything to the DOM, rather than restrict the cue's effect on
the <overlay> part of the DOM? Maybe the slides that I want to show
should be presented in a different <div> on the page and not as an
overlay on the video? I must admit I am not quite sure about the best
approach to solving cue ranges - still trying to figure out all the

>> I am a little hesitant about the user of "overlay": it is a name that
>> implies a visual representation. I don't think we should prescribe how
>> the <div> needs to be represented. In fact, for a textual audio
>> description, I would prefer not to have a screen display and only have
>> the screen reader aria-live activated. That is not a overlay any
>> longer. I think in the past HTML has tried to separate structure from
>> presentation where possible, with CSS being in control of presentation
>> issues.
> The name issue and using CSS is one track in the discussion on
> public-html-a11y, please do follow up on that.

Yup, will do. I want to go through all the things mentioned there
because a lot of different requirements have been touched upon that I
assumed as a given beforehand, but may not be the case.

>> Anyway - I am sorry I haven't had the time to reply properly to the
>> discussion in the W3C HTML accessibility taskforce yet - I promise
>> I'll get to it.
>> Incidentally, I think the W3C HTML accessibility taskforce has
>> developed into something of a discussion centre for these captions
>> issues. If you're a HTML5 member, you might want to join the taskforce
>> and subscribe to http://lists.w3.org/Archives/Public/public-html-a11y/
>> . Otherwise, I guess, we'll end up duplicating all the discussion
>> there here again.
> I assume "you" above is other WHATWG members, not me.

Yes, indeed, I changed target audience there - I wanted to encourage
everyone else to join who hasn't considered it yet. :-)


Received on Sunday, 29 November 2009 03:42:13 UTC