Re: Roll-up captions in WebVTT

On Fri, Nov 25, 2011 at 7:17 PM, Simon Pieters <simonp@opera.com> wrote:
> On Fri, 25 Nov 2011 05:51:48 +0100, Silvia Pfeiffer
> <silviapfeiffer1@gmail.com> wrote:
>
>> Hi all,
>>
>> Whenever I get asked about how to implement roll-up captions in
>> WebVTT, I have to make up some half-baked solution.
>>
>> I'd like us to come up with an improvement to WebVTT that takes proper
>> care of this issue (and possibly other issues).
>>
>> In the "Caption Model" document at
>>
>> http://www.w3.org/community/texttracks/wiki/Caption_Model#4._Caption_Text_Block_Display
>> we described what "roll-up captions" are: lines of text are drawn
>> successively into the same text rendering box and removed from it,
>> too.
>>
>> When trying to specify it in WebVTT, I usually suggest the following
>> approach:
>>
>> ==
>> WEBVTT
>>
>> 1
>> 00:01:07.395 --> 00:01:10.246
>> Hey!
>>
>> 2
>> 00:01:10.246 --> 00:01:17.000
>> Hey!
>> <c .vtt_blue>You there!</c>
>>
>> 3
>> 00:01:17.000 --> 00:01:20.000
>> <c .vtt_blue>You there!</c>
>> What did you say?
>> ==
>>
>> This will create the right rendering with text moving up over time and
>> the top line disappearing, as long as the cues are positioned at the
>> same video viewport location.
>>
>> However, there are several problems with this approach:
>> * text that is unchanged has to be included multiple times (if 4 or 5
>> lines are used, it may be repeated as often as 5 times)
>> * all the markup on the text has to be repeated in every single cue
>> * finally, it's not possible to address with a single CSS statement
>> all the cues that relate to the same position - you only have the
>> choice of all cues (with ::cue) or those of a particular id (with eg.
>> ::cue(#3)) or of a particular markup (eg. ::cue(c), ::cue(c.vtt_blue),
>> or ::cue(v[voice='fred'])).
>>
>>
>> The issue:
>> As I see it, the problem is that we don't currently represent the
>> concept of cue text rendering boxes that persist over time. These are
>> cues that are rendered in the same location but at different times
>> along the video's timeline. We are not currently able to group such
>> cues and identify them as being a "continuation", as belonging
>> together.
>>
>> Or in other words: WebVTT doesn't currently have a concept that
>> represents what CEA708 calls "windows" (see
>> http://en.wikipedia.org/wiki/CEA-708#How_to_interpret_the_caption_stream,
>> though the term "window" is not properly explained there;
>> http://www.cpcweb.com/hdtv/708.htm may be more readable).
>>
>>
>> Proposed solution:
>> In discussions with others, we've come up with several means of
>> introducing the concept of "rendering boxes" that persist over time.
>>
>> My favorite solution and hereby my proposal is to introduce a "class"
>> markup on cues (rather than on fragments of cue text). This is
>> motivated by the ideas of CSS which already have classes as a grouping
>> mechanism for different rendering areas on the page and just extends
>> this concept to the time dimension also. Thus, this allows grouping of
>> cues that belong together as a "continuation" of each other.
>>
>> For example:
>>
>> ===
>> WEBVTT
>>
>> 1.rollup
>> 00:01:07.395 --> 00:01:17.000
>> Hey!
>>
>> 2.rollup
>> 00:01:10.246 --> 00:01:20.000
>> <c .vtt_blue>You there!</c>
>>
>> 3.rollup
>> 00:01:17.000 --> 00:01:20.000
>> What did you say?
>> ===
>
> I just want to bikeshed the syntax and would prefer if it were a cue setting
> instead of part of the id line.
>
> 1
> 00:01:07.395 --> 00:01:17.000 rollup:foo
> Hey!
>
> In the ::cue() selector matching, the "Lists of WebVTT Node Objects" can
> have classes defined by the "rollup" setting (only a single class per cue).


Yeah, that would be another way of achieving the same effect. However,
I wouldn't call the parameter "rollup", because then we can't group
strictly sequential cues together to change their settings together,
but that are strictly pop-on captions. I would regard the rollup
feature just as the default way of adding more lines to an existing
rendering area.

We don't currently have a way to address cues by a particular cue
setting, which is why I wasn't a fan of this approach. But restricting
the grouping to a single class per cue is indeed a requirement, since
it can't be rendered into more than one rendering area.

Silvia.

Received on Friday, 25 November 2011 08:37:06 UTC