W3C home > Mailing lists > Public > public-texttracks@w3.org > November 2011

Roll-up captions in WebVTT

From: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
Date: Fri, 25 Nov 2011 15:51:48 +1100
Message-ID: <CAHp8n2naeCC-_rp7Bjk+xu5OUu_4v3vT0Napn3veqf=oqeGN_Q@mail.gmail.com>
To: public-texttracks@w3.org
Hi all,

Whenever I get asked about how to implement roll-up captions in
WebVTT, I have to make up some half-baked solution.

I'd like us to come up with an improvement to WebVTT that takes proper
care of this issue (and possibly other issues).

In the "Caption Model" document at
we described what "roll-up captions" are: lines of text are drawn
successively into the same text rendering box and removed from it,

When trying to specify it in WebVTT, I usually suggest the following approach:


00:01:07.395 --> 00:01:10.246

00:01:10.246 --> 00:01:17.000
<c .vtt_blue>You there!</c>

00:01:17.000 --> 00:01:20.000
<c .vtt_blue>You there!</c>
What did you say?

This will create the right rendering with text moving up over time and
the top line disappearing, as long as the cues are positioned at the
same video viewport location.

However, there are several problems with this approach:
* text that is unchanged has to be included multiple times (if 4 or 5
lines are used, it may be repeated as often as 5 times)
* all the markup on the text has to be repeated in every single cue
* finally, it's not possible to address with a single CSS statement
all the cues that relate to the same position - you only have the
choice of all cues (with ::cue) or those of a particular id (with eg.
::cue(#3)) or of a particular markup (eg. ::cue(c), ::cue(c.vtt_blue),
or ::cue(v[voice='fred'])).

The issue:
As I see it, the problem is that we don't currently represent the
concept of cue text rendering boxes that persist over time. These are
cues that are rendered in the same location but at different times
along the video's timeline. We are not currently able to group such
cues and identify them as being a "continuation", as belonging

Or in other words: WebVTT doesn't currently have a concept that
represents what CEA708 calls "windows" (see
though the term "window" is not properly explained there;
http://www.cpcweb.com/hdtv/708.htm may be more readable).

Proposed solution:
In discussions with others, we've come up with several means of
introducing the concept of "rendering boxes" that persist over time.

My favorite solution and hereby my proposal is to introduce a "class"
markup on cues (rather than on fragments of cue text). This is
motivated by the ideas of CSS which already have classes as a grouping
mechanism for different rendering areas on the page and just extends
this concept to the time dimension also. Thus, this allows grouping of
cues that belong together as a "continuation" of each other.

For example:


00:01:07.395 --> 00:01:17.000

00:01:10.246 --> 00:01:20.000
<c .vtt_blue>You there!</c>

00:01:17.000 --> 00:01:20.000
What did you say?

The text that is added to a cue of the same class is added below
(which is the normal scrolling behaviour of text). Thus, this markup
has the same effect as the markup given above. But it has some massive

* text does not have to be repeated
* markup of text does not have to be repeated
* we can address the cues that belong into the same rendering box
through one CSS statement: e.g. ::cue(.rollup)
* when implementing this, only a cue with a new class (or no class)
create a new rendering area ("div")
* the rendering continuation between cues can be upheld even when
there are several other cues in the middle that don't belong to the
same continuation
* the rendering continuation between cues can be upheld even if the
rendering area's cue settings change (e.g. if the rollup has to move
from the bottom to the top of the viewport because there is some
burnt-in text visible at the bottom of the screen that would be
obstructed by the caption text)

I can't really see any problems with this approach other than an extra
restriction to the identifier parsing, which now cannot contain a "."
any more. Did I miss anything?

Received on Friday, 25 November 2011 04:52:36 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:27:18 UTC