Re: Roll-up captions in WebVTT from Silvia Pfeiffer on 2012-04-10 (public-texttracks@w3.org from April 2012)

From: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
Date: Tue, 10 Apr 2012 18:22:37 +1000
To: Glenn Maynard <glenn@zewt.org>
Cc: David Singer <singer@apple.com>, Gal Klein <gal@plymedia.com>, public-texttracks@w3.org
Message-ID: <CAHp8n2=uQ1YWc8cNgkBAaQxO0Q2hqd+a=Gp0CjR2pYUTr_dMwA@mail.gmail.com>
On Tue, Dec 20, 2011 at 1:07 PM, Glenn Maynard <glenn@zewt.org> wrote:
> On Mon, Dec 19, 2011 at 7:51 PM, Silvia Pfeiffer <silviapfeiffer1@gmail.com>
> wrote:
>>
>> Just to be clear, since you may unknowingly be throwing out the baby
>> with the bath water:
>> If we do not provide a means to natively display scrolling text
>> overlayed on top of the video for captions, we cannot do so for
>> subtitles either (including karaoke), nor for any other kind of text
>> we want to display scrolling on top of the video in the future,
>> including credits, nor can we faithfully represent existing TV rollup
>> captions with WebVTT except through copying text between cues, which
>> you agree is inferior, since it involves repeating text.
>
>
> These are mostly different use cases than roll-ups, and are worth discussing
> on their own.  Explicit text scrolling (eg. sign translations that follow
> the signs)

Agreed, that's a different use case and not one I expected to be
solved with my approach. Though it seems that Chris Giffard's approach
would do so.


> and credits seem like they would be handled differently than
> roll-ups.

I don't see why credits would be different. They are scrolling text
over a certain timeline.


>  Karaoke is a whole beast by itself (per-word timing for
> highlighting fragments as the song is sung)

Per-word timing is already solved with the timestamps <00:00:00.000>,
so you can already do Karaoke nicely with the kind=subtitle, with
these timestamps, and with the ::before and ::after CSS
pseudo-selectors. It's therefore just a question whether you want text
lines scrolling or whether you want them fixed.


> it's an interesting case, at
> least.  FWIW, I agree with Ian that trying to "faithrully represent" legacy
> content doesn't seem worthwhile in and of itself, even if that happens to be
> inconvenient to people stuck in contracts.

A best effort should be made where all the features are at least
possible, even if not 100% identical. At the moment even a
near-faithful representation requires copying of text or a special
separate JS rendering approach.


> But regarding "in the future"--we can always add new features in the future.

With that statement you have just excluded YouTube from moving to
using WebVTT for HTML5 captions. And even though YouTube should not be
the only use case that we regard, it certainly is the biggest caption
user online, so excluding them seems counterproductive.


>> My suggestion is to group cues by giving them the same "class".
>> David's suggestion is to repeat text and mark it as a repetition, thus
>> identifying the cues with the repeated text as a group. Either will
>> work, but I would prefer not having to repeat text. My suggestion does
>> not repeat text.
>
>
> I suggested grouping all captions into two groups, based on position: top
> half or bottom half (or left and right side, for vertical text).

Interesting. We should explore that further.
Note that in CEA 708 we have 9 actual locations for rendering captions
on video. Might be that 4 are sufficient for grouping. What about the
center?


> I don't find explicitly grouping cues together objectionable.  What I don't
> like is the idea of markup that says "these cues should be rendered as
> roll-up captions".

I'm providing semantic markup that opens up the possibility to render
in a roll-up way, not a prescription to rendering them as roll-up. You
could have a browser setting that says to never render such cues as
rollup, or you could load a user style sheet that overrides what the
author does. The browser would render the author's intention by
default as rollup, since the cues go into the same group, but override
is always possible. In contrast to what he have now where we are stuck
with paint-on.


>> > We're talking about the use of roll-up in general, including prerecorded
>> > captions.  You said that roll-up captions are more natural to US
>> > readers,
>> > and that pop-on captions are more common with anime than other genres.
>> >  That
>> > simply doesn't seem to be true; roll-up captions seem exceptionally rare
>> > outside of live captions and not even supported by many media.  I showed
>> > samples from several media formats and different countries to support
>> > this.
>>
>> And I have shown counter-examples.
>
>
> Please repeat them, as I havn't seen any examples of roll-up in prerecorded
> captions (eg. movies).

I'm going to start a new thread with the wiki page that I have created
that captures the proposals of this thread and the requirements.

Regards,
Silvia.
Received on Tuesday, 10 April 2012 08:23:26 UTC