W3C home > Mailing lists > Public > public-html-bugzilla@w3.org > December 2011

[Bug 14104] <track> Live captioning

From: <bugzilla@jessica.w3.org>
Date: Sun, 04 Dec 2011 01:53:52 +0000
To: public-html-bugzilla@w3.org
Message-Id: <E1RX1H6-0002Oe-MI@jessica.w3.org>

--- Comment #22 from Silvia Pfeiffer <silviapfeiffer1@gmail.com> 2011-12-04 01:53:49 UTC ---
(In reply to comment #21)
> For those live streams, the video seems to include an internal time, which the
> captions presumably use as well. So that's rather different than what you were
> proposing.

It's the time since the stream was started, and that's exactly what I was
referring to. I don't understand how that makes a difference.

> For that kind of case, what we'd really want is not a static file to download,
> it would be a stream.

Agreed. That's what I meant with a "streaming text" file.

> You'd want to tell the server around when to start
> (presumably automatically), and you'd want to update the cues in real time,
> presumably throwing cues away that are before the earliest start time.

The streaming text file is a separate resource from the video and it contains
cues with times synchronized with the beginning of the video. New cues are
added at the end of the file. It can be either the server throwing away
captions that are from before the earliest start time, or it can be the browser
which knows the start time of the video and can tell which cues are in the

> That doesn't seem too unreasonable.


> To support things like inline live corrections
>, though, we'd probably want a
> different format than WebVTT, or at least some variant on it. e.g.:
> --------------8<--------------
> 00:00.000 --> 00:05.000
> captions that were available before the user connected
> 01:00:00.000 --> 01:02:00.000
> bla bla bla
> LIVE--> align:middle rollup-size:3
> <01:03:11.004> Bla bla <01:03:11.353> bla <rollup> <01:03:11.653> bal
> <01:03:11.710> <redoline> bla <01:03:12.004> bla bla...
> -------------->8--------------
> ...where in a LIVE block, timestamps indicate when the following text should
> appear, <rollup> indicates when to go to the next line, <redoline> indicates
> that the current line should be deleted in favour of new text... 

I'd like to keep the rollup and redo-line problems separate. The rollup problem
is applicable not only to live captioning, but as a general problem. We have a
discussion in the Text Tracks Community Group about it right now with different
options, so I'd like to defer the problem there. Also, the redo-line problem is
a new one that again should be solved independently from live captioning.

So, I just want to focus on the timing part of this problem, which is also a
<track>-related problem, not just a WebVTT problem.

Your suggestion of introducing a "LIVE" cue without timing has one big problem:
all captions for a video end up being in a single cue. That's not readable, not
easy to edit, and hardly easy to re-stream: it would be difficult to determine
what is still active when seeking to a specific offset.

My approach was to allow cues to be active until the next cue appears.
(Incidentally, for rollup captions this could be adapted to being active until
the next three cues appear.)

For example instead of this (endless) cue:

> LIVE--> align:middle rollup-size:3
> <01:03:11.004> Bla bla <01:03:11.353> bla <rollup> <01:03:11.653> bal
> <01:03:11.710> <redoline> bla <01:03:12.004> bla bla...

you would have something like:

01:03:11.004 --> NEXT(3) align:middle
<01:03:11.004> Bla bla <01:03:11.353> bla

01:03:11.653 --> NEXT(3)
bal <01:03:11.710> <redoline> bla <01:03:12.004> bla bla...

The third start time of a cue after the current cue should be easy to determine
in code.

> This is just a
> strawman, I don't know what the right solution is here.

Yeah, I am not 100% sure what is best either, but finding
advantages/disadvantages with some markup is certainly good.

> In particular, what should happen if you seek backwards to a point between when
> a line was rolled up and a correction was made? Should we show the incorrect
> text, or should the incorrect text be dropped in favour of the new text?

During live streaming, no seeking should be possible. So, that problem would
not occur. Usually for captions that were done live, there is some
post-production. The post-production would typically remove all redone

Also, events are handled as they are reached, so if the redone stuff is still
there, then playback would exactly replicate the original changes again, which
it should.

Configure bugmail: https://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.
Received on Sunday, 4 December 2011 01:53:54 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 20:02:09 UTC