Re: Roll-up captions in WebVTT

On Tue, Dec 20, 2011 at 11:10 AM, David Singer <singer@apple.com> wrote:
>
> On Dec 19, 2011, at 16:00 , Glenn Maynard wrote:
>
> On Mon, Dec 19, 2011 at 11:30 AM, David Singer <singer@apple.com> wrote:
>> It's only evil if you can't tell it's a duplicate when you need to know
>> (e.g. when using TTS), and what I am suggesting is tagging to say that, for
>> those that need to know.
>
> It's evil and ugly regardless.  For credits, you would need hundreds or even
> thousands of copies of each cue to scroll all the way up the screen.  I'm
> actually a bit taken aback that it's being put forward seriously.
>
>
> I agree it doesn't work well for long credits.  I am also a bit taken aback
> at having an idea dismissed as 'evil and ugly' before we've really either
> worked it out or seen the alternatives. Can we debate the ideas along with
> (or preferably without) the value adjectives?
>
> I understand it doesn't *look* clean to repeat a text line that occurs in
> two different places in two consecutive cues, but it has a number of
> advantages.
>
> The disadvantages:
> * it doesn't 'feel right' to repeat things (but the bit-rate gain is
> minimal, in my opinion)
> * tagging is needed so that systems that need to know when it has happened
> can tell (e.g. screen readers)

That tagging is asking a lot of extra work from a lot of tools,
including tools that create a text-only transcript from a WebVTT file,
or search engines that crawl the WebVTT file to index them, or tools
that use the text in a WebVTT file to allow users to jump to cues that
contain certain keywords that they are searching for, and of course
the rendering engine and screen readers and braille readers as you
already noted.


> The advantages:
> * no cue-to-cue dependency -- no I frames and P frames (this is pretty big,
> IMHO); each cue contains all its own text

In case this wasn't clear: in my proposal with a grouping "class" on
cues there is also no cue-to-cue dependency. Each cue contains all its
own text and can be presented without any other cue. The continued
display on screen comes from time overlapping the text lines. I'm
doing this by giving a line of text the exact duration that it is
visible on screen and allowing it to move to different on-screen
locations duration that time when it is pushed up a line by another
text line (from another cue) that is added to the same location.

Also, when you look at your proposal in detail, you are actually
introducing a cue-to-cue dependency, because any tool that wants to
handle the text in a particular cue has to also look at all the other
cues around it to see if that is actually the start time/end time of
that text or whether it is a repeated text.


> * allows the expression of any transition, not just scrolling: moving to
> stay with the speaker or out of the way, changes of color, background, etc.

I believe jumping text to a different screen location altogether or
transitioning across the screen is a different use case, since that
does not involve a change of the text location through adding more
text to the same location, but by explicit positioning changes. In
fact, I think we should explore Glenn's suggestion for some CSS
transition-like markup for this use case.


Silvia.

Received on Tuesday, 20 December 2011 01:15:14 UTC