Re: Roll-up captions in WebVTT from Glenn Maynard on 2011-12-20 (public-texttracks@w3.org from December 2011)

From: Glenn Maynard <glenn@zewt.org>
Date: Mon, 19 Dec 2011 21:07:16 -0500
To: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
Cc: David Singer <singer@apple.com>, Gal Klein <gal@plymedia.com>, public-texttracks@w3.org
Message-ID: <CABirCh-6Sr0xh0yn4x=U_UkKaNWijE_Daah7F26JWZDJUjbY7Q@mail.gmail.com>
On Mon, Dec 19, 2011 at 7:51 PM, Silvia Pfeiffer
<silviapfeiffer1@gmail.com>wrote:

>  Just to be clear, since you may unknowingly be throwing out the baby
> with the bath water:
> If we do not provide a means to natively display scrolling text
> overlayed on top of the video for captions, we cannot do so for
> subtitles either (including karaoke), nor for any other kind of text
> we want to display scrolling on top of the video in the future,
> including credits, nor can we faithfully represent existing TV rollup
> captions with WebVTT except through copying text between cues, which
> you agree is inferior, since it involves repeating text.
>

These are mostly different use cases than roll-ups, and are worth
discussing on their own.  Explicit text scrolling (eg. sign translations
that follow the signs) and credits seem like they would be handled
differently than roll-ups.  Karaoke is a whole beast by itself (per-word
timing for highlighting fragments as the song is sung); it's an interesting
case, at least.  FWIW, I agree with Ian that trying to "faithrully
represent" legacy content doesn't seem worthwhile in and of itself, even if
that happens to be inconvenient to people stuck in contracts.

But regarding "in the future"--we can always add new features in the future.

 My suggestion is to group cues by giving them the same "class".
> David's suggestion is to repeat text and mark it as a repetition, thus
> identifying the cues with the repeated text as a group. Either will
> work, but I would prefer not having to repeat text. My suggestion does
> not repeat text.
>

I suggested grouping all captions into two groups, based on position: top
half or bottom half (or left and right side, for vertical text).

I don't find explicitly grouping cues together objectionable.  What I don't
like is the idea of markup that says "these cues should be rendered as
roll-up captions".

 > We're talking about the use of roll-up in general, including prerecorded
> > captions.  You said that roll-up captions are more natural to US readers,
> > and that pop-on captions are more common with anime than other genres.
>  That
> > simply doesn't seem to be true; roll-up captions seem exceptionally rare
> > outside of live captions and not even supported by many media.  I showed
> > samples from several media formats and different countries to support
> this.
>
> And I have shown counter-examples.


Please repeat them, as I havn't seen any examples of roll-up in prerecorded
captions (eg. movies).  I stand by my conclusion: for prerecorded captions,
roll-up captions are rare and pop-on is the norm.

I accept your premise, but would
> like to ask you to be open to mine, too. I can also accept that
> several publishers that have started publishing or streaming video
> with captions online find it easier and currently sufficient to just
> go with the pop-on model. Indeed, it has taken YouTube 5 years of
> providing captions online for thousands of videos before the lack of
> scrolling captions hurt enough to actually implement support for it.
> But they are supporting it now and it is here to stay.
>

If you mean that WebVTT has to support it because YouTube does, I don't
agree with that (and if it took them five years before it became worth
spending the time implementing it, that seems to say something).

I'm certainly open to yours--that's why I'm spending time debating it--I
just don't find the claim that roll-up captions/subtitles are common
(outside of live captioning) to be convincing.


On Mon, Dec 19, 2011 at 8:01 PM, David Singer <singer@apple.com> wrote:

> If it's ALSO associated with a CSS class and that class ALSO has
> transitions on Y-position, then it'll scroll up on systems that ALSO
> support CSS.
>
> so you'd see (very rough example)
>
> WEBVTT FILE
>
> 1
> 00:00:03.500 --> 00:00:05.000
> <cue-id id=1>Everyone wants the most from life</cue-id>
>
> 2
> 00:00:06.000 --> 00:00:09.000
> <cue-id id=1>Everyone wants the most from life</cue-id>
> <cue-id id=2>but they seem unwilling to work for it</cue-id>
>
> 3
> 00:00:11.000 --> 00:00:14.000 A:end
> <cue-id id=2>but they seem unwilling to work for it</cue-id>
> even though opportunities abound
>
> and then the style sheet has a general CSS transition on stuff for
> Y-position.
>

Thanks for giving a concrete example.  I don't find that nearly as
objectionable.  If roll-ups were supported, though, I'd prefer a method
that requires no markup at all, because the resulting user experience will
be much more consistent.  Failing that, I'd prefer one that requires no
repetition, such as the classification idea.  Both of those make it much
easier to vary the rendering at the user's request or according to device
requirements.

For example, the above hardcodes a rollup area of two lines.  If the user
wants three lines, he's out of luck.  If rollup hardcodes four lines and
the user's font size causes that to extend halfway up the screen, it's
similarly hard for the UA to reduce the rollup to three lines.

To expand on (what I understand of) Silvia's idea, you could add a CSS
style, "rollup-class".  You'd then have:

00:00:00.00 --> 00:00:00.00
<c rollupTop>text</c>

and in your CSS, you'd have:

.rollupTop { vtt-rollup-class: bottom; }

The rollup class would indicate that the matching DOM block, if rendered as
roll-up, should be in the rollup block at the bottom, scrolling up.  "top"
would render from the top, going down.  It requires no new syntax.  But,
I'd sooner just infer this from the line position and horizontal/vertical
mode.

-- 
Glenn Maynard
Received on Tuesday, 20 December 2011 02:07:45 UTC