Re: Roll-up captions in WebVTT

On Fri, Dec 16, 2011 at 7:43 PM, Glenn Maynard <glenn@zewt.org> wrote:
> It's hard to do with live captions, since you can end up in situations where
> you don't have any good place on screen to put a caption.  It'd be
> interesting to try this sort of captioning with "live" captions (eg.
> captions without carefully-edited timing information and other tweaks), and
> see what actually happens, though.  Maybe I'm assuming it doesn't work well
> because of existing practice, when it's actually a solvable problem.

To follow up to this, now that I know what's actually going on: The
real reason for roll-up captions is that live captions are not only
added in realtime, they're edited in realtime.  Text is added to the
end of a cue as the transcriber types, and they might (at least in
principle) be edited in other ways.  That's something that pop-on
captions simply can't deal with, since you'd end up putting a caption
on screen, then having it end up requiring more space than it has.
Roll-up captions just roll the whole thing up to make a new empty
line.

In retrospect this seems obvious, but I figured I should mention it
since it didn't occur to me and nobody corrected me.

(Of course, while the API can probably handle this, representing this
sort of thing in the WebVTT file format itself is well out of scope;
WebVTT cue blocks are parsed atomically.  If someone wants to support
realtime captioning on the web, they'll need to define a protocol to
transmit partial cues and to handle other types of in-place edits,
like EIA-608 very rudimentarily does.)


On Sun, Dec 18, 2011 at 7:01 PM, Silvia Pfeiffer
<silviapfeiffer1@gmail.com> wrote:
> I don't think you can do rollup as a preference - how would you do
> that? I think you have to provide two different files with different
> makrup for people to choose from if you want to support both means.

For just rendering as roll-up (lines appear at the bottom, pushing old
lines up and out), I don't see the problem.  The markup specifies the
contents; the user preference determines the mode of rendering.

Fully defining this would be a fair bit of work, and there are
questions that would need to be answered (eg. what do you do with
captions that specify the location on screen), but there's nothing
requiring this to be a markup feature.

> I believe it may be a cultural issue whether you prefer one style or
> the other: in the US, rollup seems more natural and your examples all
> seem to be from Japan, so I assume there it's more natural to swap out
> lines. So, as an author, you'd create the captions in the appropriate
> way for each language.

The subtitles I'm reading are written for English speakers, very often
by Americans, and they're all "pop-on".  Bitmap subtitles on DVDs (the
form used by most movies) didn't even support roll-up (from what I
recall when I implemented a decoder many years back).  All soft
captions on YouTube and Netflix that I've seen are pop-on.  All hard
captions I can find on YouTube are pop-on (found randomly: Indian:
http://www.youtube.com/watch?v=achG2zZTxbA; Scandinavian:
http://www.youtube.com/watch?v=d6-yP7Kh6fY#t=2m).  I strongly suspect
the same for Blu-ray releases, but I don't own any to check.  I see
pop-on all over the place, and I can't even remember the last time I
saw roll-up captions.

I think you're mistaken about roll-up captions being "more natural" or
more common in the US.  I don't think this is a cultural issue at all,
and having to create separate tracks to represent each possible user
preference (never mind the combinatoric explosion this would lead to)
would be a very bad way forward.

-- 
Glenn Maynard

Received on Monday, 19 December 2011 02:11:38 UTC