Re: Roll-up captions in WebVTT

On Mon, Dec 19, 2011 at 7:10 PM, David Singer <singer@apple.com> wrote:

> I agree it doesn't work well for long credits.  I am also a bit taken
> aback at having an idea dismissed as 'evil and ugly' before we've really
> either worked it out or seen the alternatives. Can we debate the ideas
> along with (or preferably without) the value adjectives?
>

I used strong language to express my strong distaste for the idea.  Of
course, I'd never tell anyone not to debate an idea I don't like.


> I understand it doesn't *look* clean to repeat a text line that occurs in
> two different places in two consecutive cues, but it has a number of
> advantages.
>
> The disadvantages:
> * it doesn't 'feel right' to repeat things (but the bit-rate gain is
> minimal, in my opinion)
> * tagging is needed so that systems that need to know when it has happened
> can tell (e.g. screen readers)
>

Repeating each cue dozens of times essentially turns it into a
non-human-readable, non-human-editable format.  This would be a great loss.

The advantages:
> * no cue-to-cue dependency -- no I frames and P frames (this is pretty
> big, IMHO); each cue contains all its own text
> * allows the expression of any transition, not just scrolling: moving to
> stay with the speaker or out of the way, changes of color, background, etc.
>

Fundamentally, it's presentational rather than semantic.  If this is a
markup feature at all (and I don't believe roll-ups should be), semantic
markup ("render this cue as a roll-up caption") allows UAs to adjust the
presentation.  With presentational markup ("put the caption at position
0.9; now scroll it to position 0.8 ..."), the exact details are baked into
the captions; the renderer doesn't really know why the motion is happening
or the dependencies between various animations, so it can't really change
anything.

For example, a UA might want to allow users to say "enlarge the roll-up
area, so stuff stays on screen longer".

* allows the use of CSS transitions to express the optionality and effect
> of the transition
>

This doesn't require repeating cues.  For example,

00:11.000 --> 00:13.000 Position:0.8 Delay:1.5 Linear:0.5 Position:0.7
<v Roger Bingham>We are in New York City

which would show the cue at L:0.8, wait 1.5 seconds, then scroll to L:0.7
over half a second.  (This is just off the top of my head; something that
translates more directly to CSS transitions--which I'm not terribly
familiar with--would be better.)

I think this has reasonable use cases (eg. sign translations that follow a
sign as it moves across the screen).  I don't think this is appropriate for
roll-up captions, but it's far less objectionable than multiple cues, where
the suggestion seemed to look like:

00:11.000 --> 00:11.500 L:0.8
<v Roger Bingham>We are in New York City

00:11.500 --> 00:11.525 L:0.79
<v Roger Bingham>We are in New York City

00:11.525 --> 00:11.550 L:0.78
<v Roger Bingham>We are in New York City

00:11.550 --> 00:11.575 L:0.77
<v Roger Bingham>We are in New York City

and so on.  That's something you can do today, if you really want to, but
it's messy, probably won't lead to very smooth scrolling, requires huge
amounts of repetition and makes the file format essentially impossible to
edit by hand.

-- 
Glenn Maynard

Received on Tuesday, 20 December 2011 00:51:20 UTC