Roll-up captions in WebVTT from Silvia Pfeiffer on 2011-11-25 (public-texttracks@w3.org from November 2011)

From: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
Date: Fri, 25 Nov 2011 15:51:48 +1100
To: public-texttracks@w3.org
Message-ID: <CAHp8n2naeCC-_rp7Bjk+xu5OUu_4v3vT0Napn3veqf=oqeGN_Q@mail.gmail.com>
Hi all,

Whenever I get asked about how to implement roll-up captions in
WebVTT, I have to make up some half-baked solution.

I'd like us to come up with an improvement to WebVTT that takes proper
care of this issue (and possibly other issues).

In the "Caption Model" document at
http://www.w3.org/community/texttracks/wiki/Caption_Model#4._Caption_Text_Block_Display
we described what "roll-up captions" are: lines of text are drawn
successively into the same text rendering box and removed from it,
too.

When trying to specify it in WebVTT, I usually suggest the following approach:

==
WEBVTT

1
00:01:07.395 --> 00:01:10.246
Hey!

2
00:01:10.246 --> 00:01:17.000
Hey!
<c .vtt_blue>You there!</c>

3
00:01:17.000 --> 00:01:20.000
<c .vtt_blue>You there!</c>
What did you say?
==

This will create the right rendering with text moving up over time and
the top line disappearing, as long as the cues are positioned at the
same video viewport location.

However, there are several problems with this approach:
* text that is unchanged has to be included multiple times (if 4 or 5
lines are used, it may be repeated as often as 5 times)
* all the markup on the text has to be repeated in every single cue
* finally, it's not possible to address with a single CSS statement
all the cues that relate to the same position - you only have the
choice of all cues (with ::cue) or those of a particular id (with eg.
::cue(#3)) or of a particular markup (eg. ::cue(c), ::cue(c.vtt_blue),
or ::cue(v[voice='fred'])).


The issue:
As I see it, the problem is that we don't currently represent the
concept of cue text rendering boxes that persist over time. These are
cues that are rendered in the same location but at different times
along the video's timeline. We are not currently able to group such
cues and identify them as being a "continuation", as belonging
together.

Or in other words: WebVTT doesn't currently have a concept that
represents what CEA708 calls "windows" (see
http://en.wikipedia.org/wiki/CEA-708#How_to_interpret_the_caption_stream,
though the term "window" is not properly explained there;
http://www.cpcweb.com/hdtv/708.htm may be more readable).


Proposed solution:
In discussions with others, we've come up with several means of
introducing the concept of "rendering boxes" that persist over time.

My favorite solution and hereby my proposal is to introduce a "class"
markup on cues (rather than on fragments of cue text). This is
motivated by the ideas of CSS which already have classes as a grouping
mechanism for different rendering areas on the page and just extends
this concept to the time dimension also. Thus, this allows grouping of
cues that belong together as a "continuation" of each other.

For example:

===
WEBVTT

1.rollup
00:01:07.395 --> 00:01:17.000
Hey!

2.rollup
00:01:10.246 --> 00:01:20.000
<c .vtt_blue>You there!</c>

3.rollup
00:01:17.000 --> 00:01:20.000
What did you say?
===

The text that is added to a cue of the same class is added below
(which is the normal scrolling behaviour of text). Thus, this markup
has the same effect as the markup given above. But it has some massive
advantages.

Advantages:
* text does not have to be repeated
* markup of text does not have to be repeated
* we can address the cues that belong into the same rendering box
through one CSS statement: e.g. ::cue(.rollup)
* when implementing this, only a cue with a new class (or no class)
create a new rendering area ("div")
* the rendering continuation between cues can be upheld even when
there are several other cues in the middle that don't belong to the
same continuation
* the rendering continuation between cues can be upheld even if the
rendering area's cue settings change (e.g. if the rollup has to move
from the bottom to the top of the viewport because there is some
burnt-in text visible at the bottom of the screen that would be
obstructed by the caption text)

I can't really see any problems with this approach other than an extra
restriction to the identifier parsing, which now cannot contain a "."
any more. Did I miss anything?

Cheers,
Silvia.
Received on Friday, 25 November 2011 04:52:36 UTC