RE: Roll-up captions in WebVTT from Gal Klein on 2011-12-19 (public-texttracks@w3.org from December 2011)

From: Gal Klein <gal@plymedia.com>
Date: Mon, 19 Dec 2011 10:10:16 +0200
To: "'Glenn Maynard'" <glenn@zewt.org>, "'Silvia Pfeiffer'" <silviapfeiffer1@gmail.com>
Cc: "'David Singer'" <singer@apple.com>, <public-texttracks@w3.org>
Message-ID: <04b301ccbe25$a6f25af0$f4d710d0$@plymedia.com>
Hi All,

As I was posting in previous emails, we have been doing LIVE caption for online video for a while now.
We never use roll-up captions as they make it very difficult to follow with the video (there have been many articles stating the roll-up caption obscure the video viewage).
Pop-up caption can be transmitted in Real Time with just some basic logics that can be inserted to an online caption tool.
Will be happy to explore this with the team and also show you examples (like the WSJ front page 4 times a day), improved synchronization is also possible with some additional effort, and we are working on implementing this as well.

Best,

Gal


-----Original Message-----
From: Glenn Maynard [mailto:glenn@zewt.org] 
Sent: Monday, December 19, 2011 4:11 AM
To: Silvia Pfeiffer
Cc: David Singer; public-texttracks@w3.org
Subject: Re: Roll-up captions in WebVTT

On Fri, Dec 16, 2011 at 7:43 PM, Glenn Maynard <glenn@zewt.org> wrote:
> It's hard to do with live captions, since you can end up in situations 
> where you don't have any good place on screen to put a caption.  It'd 
> be interesting to try this sort of captioning with "live" captions (eg.
> captions without carefully-edited timing information and other 
> tweaks), and see what actually happens, though.  Maybe I'm assuming it 
> doesn't work well because of existing practice, when it's actually a solvable problem.

To follow up to this, now that I know what's actually going on: The real reason for roll-up captions is that live captions are not only added in realtime, they're edited in realtime.  Text is added to the end of a cue as the transcriber types, and they might (at least in
principle) be edited in other ways.  That's something that pop-on captions simply can't deal with, since you'd end up putting a caption on screen, then having it end up requiring more space than it has.
Roll-up captions just roll the whole thing up to make a new empty line.

In retrospect this seems obvious, but I figured I should mention it since it didn't occur to me and nobody corrected me.

(Of course, while the API can probably handle this, representing this sort of thing in the WebVTT file format itself is well out of scope; WebVTT cue blocks are parsed atomically.  If someone wants to support realtime captioning on the web, they'll need to define a protocol to transmit partial cues and to handle other types of in-place edits, like EIA-608 very rudimentarily does.)


On Sun, Dec 18, 2011 at 7:01 PM, Silvia Pfeiffer <silviapfeiffer1@gmail.com> wrote:
> I don't think you can do rollup as a preference - how would you do 
> that? I think you have to provide two different files with different 
> makrup for people to choose from if you want to support both means.

For just rendering as roll-up (lines appear at the bottom, pushing old lines up and out), I don't see the problem.  The markup specifies the contents; the user preference determines the mode of rendering.

Fully defining this would be a fair bit of work, and there are questions that would need to be answered (eg. what do you do with captions that specify the location on screen), but there's nothing requiring this to be a markup feature.

> I believe it may be a cultural issue whether you prefer one style or 
> the other: in the US, rollup seems more natural and your examples all 
> seem to be from Japan, so I assume there it's more natural to swap out 
> lines. So, as an author, you'd create the captions in the appropriate 
> way for each language.

The subtitles I'm reading are written for English speakers, very often by Americans, and they're all "pop-on".  Bitmap subtitles on DVDs (the form used by most movies) didn't even support roll-up (from what I recall when I implemented a decoder many years back).  All soft captions on YouTube and Netflix that I've seen are pop-on.  All hard captions I can find on YouTube are pop-on (found randomly: Indian:
http://www.youtube.com/watch?v=achG2zZTxbA; Scandinavian:
http://www.youtube.com/watch?v=d6-yP7Kh6fY#t=2m).  I strongly suspect the same for Blu-ray releases, but I don't own any to check.  I see pop-on all over the place, and I can't even remember the last time I saw roll-up captions.

I think you're mistaken about roll-up captions being "more natural" or more common in the US.  I don't think this is a cultural issue at all, and having to create separate tracks to represent each possible user preference (never mind the combinatoric explosion this would lead to) would be a very bad way forward.

--
Glenn Maynard
Received on Monday, 19 December 2011 08:17:39 UTC