Re: Roll-up captions in WebVTT from David Singer on 2012-02-27 (public-texttracks@w3.org from February 2012)

From: David Singer <singer@apple.com>
Date: Mon, 27 Feb 2012 11:36:07 -0800
To: public-texttracks@w3.org
Message-id: <9D9CD9BA-5CF4-4925-A953-171E90BCE2C3@apple.com>
after a few weeks on some other crises, I am trying to catch up.  sorry for the delay.  I think we may be converging…

On Dec 19, 2011, at 16:51 , Silvia Pfeiffer wrote:

>> 
>> Not if you really want to be able to see roll-ups in general, since you'd
>> only be able to see roll-ups where the author has specified it.  Almost no
>> authors will do that--you can disagree with that premise if you want, but I
>> think I'm right--so you'd end up seeing pop-ons most of the time.
> 
> Those authors that care about rollup will use it. It means that if you
> disagree with the rollup display that an author has provided, you can
> override it to be pop-on. That you cannot override pop-on to be
> displayed as rollup is not important, seeing as there doesn't seem to
> be a need for this kind of conversion.

If we use CSS for the roll-up, then it's a question of style sheets that apply to VTT or VTT regions, and it's then possible to construct a UA that allows for user style-sheets.  So it *might* be possible for a user to ask for roll-up, when pop-up movement was authored.  In pop-on, nothing moves…


On Dec 19, 2011, at 17:14 , Silvia Pfeiffer wrote:

>> The advantages:
>> * no cue-to-cue dependency -- no I frames and P frames (this is pretty big,
>> IMHO); each cue contains all its own text
> 
> In case this wasn't clear: in my proposal with a grouping "class" on
> cues there is also no cue-to-cue dependency. Each cue contains all its
> own text and can be presented without any other cue. The continued
> display on screen comes from time overlapping the text lines. I'm
> doing this by giving a line of text the exact duration that it is
> visible on screen and allowing it to move to different on-screen
> locations duration that time when it is pushed up a line by another
> text line (from another cue) that is added to the same location.
> 
> Also, when you look at your proposal in detail, you are actually
> introducing a cue-to-cue dependency, because any tool that wants to
> handle the text in a particular cue has to also look at all the other
> cues around it to see if that is actually the start time/end time of
> that text or whether it is a repeated text.

No, for the *reader* it's all I-frames.  You want to know what's on screen at that time?  The cue tells you.  Yes, if you'd been displaying from previous times, then the text might have scrolled to that position, but random access is trivial; find the earliest cue that overlaps the time you want to start.  You don't have to seek back in time to find earlier cues that would have scrolled up but are still visible.

But I think you achieve this with the ID syntax, below…


>> * allows the expression of any transition, not just scrolling: moving to
>> stay with the speaker or out of the way, changes of color, background, etc.
> 
> I believe jumping text to a different screen location altogether or
> transitioning across the screen is a different use case, since that
> does not involve a change of the text location through adding more
> text to the same location, but by explicit positioning changes. In
> fact, I think we should explore Glenn's suggestion for some CSS
> transition-like markup for this use case.

I think I am trying to use CSS position-change for all of these;  to me, that's right - use CSS transitions to get the smooth movement we want.

On Dec 19, 2011, at 17:48 , Silvia Pfeiffer wrote:

> 
> Would you do the movement then explicitly with a new pseudo-class?
> E.g. :repeated

I was thinking of plain classes, but a pseudo-class is attractive as there is then one 'name' for users to write in user-style-sheet to make smooth scroll happen.  Could we use a pseudo-class with your idea below?  Something like "this class applies to lines of text that move because more text has arrived to fill in their region?"

> 
> CSS:
> 
> ::cue(cue-id#1) {
>  top: 85%;
>  transition: top .2s linear;
> }
> 
> ::cue(cue-id#1):repeated {
>  top: 80%;
>  transition: top 0.2s linear;
> }
> 
> 
> In my proposal, the movement would be provided by the browser and does
> not need to be marked up by the author.

Me too.  Agreed, this is a presentational, optional, feature, that belongs in the UA.

> All they do is group the cues
> together. For example your file would look like this with time
> overlapping cues:
> 
> WEBVTT FILE
> 
> 1.captions
> 00:00:03.500 --> 00:00:09.000
> Everyone wants the most from life
> 
> 2.captions
> 00:00:06.000 --> 00:00:14.000
> but they seem unwilling to work for it
> 
> 3.captions
> 00:00:11.000 --> 00:00:14.000 A:end
> even though opportunities abound
> 
> 
> This would automatically get the browser to identify a rendering area
> called ".captions" into which one line after the other are rendered.
> As cue 2 is added to cue 1, cue1's text moves up. As cue 3's text is
> added to cue 2, cue 2's text moves up and the whole cue moves to be
> right aligned. We could then add transition properties similar to what
> Glenn has suggested as a cue setting, e.g. X: position 0.2s linear .

If we can set a transition property on the text contents of the caption ID, so vertical movement is CSS transitioned smoothly, this would work, sure.


On Dec 19, 2011, at 18:07 , Glenn Maynard wrote:

> I don't find explicitly grouping cues together objectionable.  What I don't like is the idea of markup that says "these cues should be rendered as roll-up captions".

we might be all in agreement here.  I want the VTT language to tell me what text is on screen, and where it goes, and something else (I suggest CSS transitions) to handle smoothness of motion 


David Singer
Multimedia and Software Standards, Apple Inc.
Received on Monday, 27 February 2012 19:36:43 UTC