Re: Roll-up captions in WebVTT from Silvia Pfeiffer on 2011-12-20 (public-texttracks@w3.org from December 2011)

From: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
Date: Tue, 20 Dec 2011 12:48:17 +1100
To: David Singer <singer@apple.com>
Cc: Glenn Maynard <glenn@zewt.org>, public-texttracks@w3.org
Message-ID: <CAHp8n2kWb5SnYSBLbUL4kXSOocev6A4XZkp1RASScrrbnK6fCQ@mail.gmail.com>
On Tue, Dec 20, 2011 at 12:01 PM, David Singer <singer@apple.com> wrote:
>
> On Dec 19, 2011, at 16:50 , Glenn Maynard wrote:
>
>> On Mon, Dec 19, 2011 at 7:10 PM, David Singer <singer@apple.com> wrote:
>> I agree it doesn't work well for long credits.  I am also a bit taken aback at having an idea dismissed as 'evil and ugly' before we've really either worked it out or seen the alternatives. Can we debate the ideas along with (or preferably without) the value adjectives?
>>
>> I used strong language to express my strong distaste for the idea.  Of course, I'd never tell anyone not to debate an idea I don't like.
>>
>> I understand it doesn't *look* clean to repeat a text line that occurs in two different places in two consecutive cues, but it has a number of advantages.
>>
>> The disadvantages:
>> * it doesn't 'feel right' to repeat things (but the bit-rate gain is minimal, in my opinion)
>> * tagging is needed so that systems that need to know when it has happened can tell (e.g. screen readers)
>>
>> Repeating each cue dozens of times essentially turns it into a non-human-readable, non-human-editable format.  This would be a great loss.
>
> Yes, I agree, add that as a disadvantage: doesn't work for long scrollable texts.
>
> But note it's not the length of the 'paragraph' that matters, but the height of the scrolling area.  A 3-line scrolling area can only possibly need 2 repeats.


Doing that for every line of text make the file 3 times as big for
just this use case. It's nothing when compared to the video content, I
agree, but we are compressing the hell out of JS files to make them
load faster and not delay the load of the rest of the Web page - it
just doesn't seem right to be so wasteful with our waiting time for
the video, in particular when we deliver small videos to mobile
phones, where indeed we may then wait longer for the WebVTT file to
download than for the video element to be set up for playback if it's
a large enough caption file.



>> The advantages:
>> * no cue-to-cue dependency -- no I frames and P frames (this is pretty big, IMHO); each cue contains all its own text
>> * allows the expression of any transition, not just scrolling: moving to stay with the speaker or out of the way, changes of color, background, etc.
>>
>> Fundamentally, it's presentational rather than semantic.  If this is a markup feature at all (and I don't believe roll-ups should be), semantic markup ("render this cue as a roll-up caption")
>
> No, I am not saying that.  I am saying that the semantic mark-up is "this span is the same as a span in adjacent cues with the same ID"
>
> If it's ALSO associated with a CSS class and that class ALSO has transitions on Y-position, then it'll scroll up on systems that ALSO support CSS.
>
> so you'd see (very rough example)
>
> WEBVTT FILE
>
> 1
> 00:00:03.500 --> 00:00:05.000
> <cue-id id=1>Everyone wants the most from life</cue-id>
>
> 2
> 00:00:06.000 --> 00:00:09.000
> <cue-id id=1>Everyone wants the most from life</cue-id>
> <cue-id id=2>but they seem unwilling to work for it</cue-id>
>
> 3
> 00:00:11.000 --> 00:00:14.000 A:end
> <cue-id id=2>but they seem unwilling to work for it</cue-id>
> even though opportunities abound

Your cue-id text is always removed from screen for some time (e.g.
1sec between cue 1 and cue 2, and for 2 sec between cue 2 and cue 3.
Is that on purpose?


> and then the style sheet has a general CSS transition on stuff for Y-position.


Would you do the movement then explicitly with a new pseudo-class?
E.g. :repeated

CSS:

::cue(cue-id#1) {
  top: 85%;
  transition: top .2s linear;
}

::cue(cue-id#1):repeated {
  top: 80%;
  transition: top 0.2s linear;
}


In my proposal, the movement would be provided by the browser and does
not need to be marked up by the author. All they do is group the cues
together. For example your file would look like this with time
overlapping cues:

WEBVTT FILE

1.captions
00:00:03.500 --> 00:00:09.000
Everyone wants the most from life

2.captions
00:00:06.000 --> 00:00:14.000
but they seem unwilling to work for it

3.captions
00:00:11.000 --> 00:00:14.000 A:end
even though opportunities abound


This would automatically get the browser to identify a rendering area
called ".captions" into which one line after the other are rendered.
As cue 2 is added to cue 1, cue1's text moves up. As cue 3's text is
added to cue 2, cue 2's text moves up and the whole cue moves to be
right aligned. We could then add transition properties similar to what
Glenn has suggested as a cue setting, e.g. X: position 0.2s linear .


>> For example, a UA might want to allow users to say "enlarge the roll-up area, so stuff stays on screen longer".
>>
>> * allows the use of CSS transitions to express the optionality and effect of the transition
>>
>> This doesn't require repeating cues.  For example,
>>
>> 00:11.000 --> 00:13.000 Position:0.8 Delay:1.5 Linear:0.5 Position:0.7
>> <v Roger Bingham>We are in New York City
>>
>> which would show the cue at L:0.8, wait 1.5 seconds, then scroll to L:0.7 over half a second.  (This is just off the top of my head; something that translates more directly to CSS transitions--which I'm not terribly familiar with--would be better.)
>
> Cool, but I think it needs to work on sub-parts of cues, not just whole ones.

Why do you need subparts of cues to move? I would think all we need is
lines of text to move, and that can be achieved by making each line a
cue of its own.


Silvia.
Received on Tuesday, 20 December 2011 01:49:06 UTC