Re: Handling live translation of cues to WebVTT from Silvia Pfeiffer on 2014-01-28 (public-texttracks@w3.org from January 2014)

From: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
Date: Tue, 28 Jan 2014 20:48:28 +1100
To: Philip Jägenstedt <philipj@opera.com>
Cc: Brendan Long <B.Long@cablelabs.com>, "public-texttracks@w3.org" <public-texttracks@w3.org>
Message-ID: <CAHp8n2mkTGK2hF=6cnPHis7TjO5g64BKc3yonTPKgaDTv8ZcmA@mail.gmail.com>

This is a complicated issue.

We will likely have to cover both bases: MSE with JavaScrip-based
streamed captions (the way HLS have defined it) and live streaming
using the <video> and <track> elements. In both these cases you can
probably delay the video stream sufficiently (maybe 1-5 min latency)
to allow for the creation of full VTT cues and adding them at the end
of the VTT file.

Then there is the WebRTC case where you are in a live conference with
others and you really don't want to have to delay the content, not
even by 5sec. However, in the latter case, you likely will not use
WebVTT and instead use some sort of real-time-text approach and maybe
store the results in a VTT file when recording the conference.

In any case, as a first instance there are these bugs to consider:

https://www.w3.org/Bugs/Public/show_bug.cgi?id=18029
https://www.w3.org/Bugs/Public/show_bug.cgi?id=23414

Unless we can deal with growing WebVTT files, we cannot use <track> to
deliver live captions.

Now, for real-world use cases, I've seen examples at Google and Apple
where the HLS approach with caption segment files has been used.

I've seen the use of "keep alive" cues that make sure that the caption
stream becomes gap-less and thus it's possible to stall the video and
caption track to make sure all data has arrived before continuing
playback.

Anyway, I like the path that Philip is going down. I do think we need
to resolve the above bugs first and I haven't had time to dedicate to
this. It's really a v2 feature for me, but I don't want to stop this
discussion!

Cheers,
Silvia.

On Tue, Jan 28, 2014 at 7:19 PM, Philip Jägenstedt <philipj@opera.com> wrote:
> On Mon, Jan 27, 2014 at 11:35 PM, Brendan Long <B.Long@cablelabs.com> wrote:
>> On Mon, 2014-01-27 at 10:01 +0700, Philip Jägenstedt wrote:
>>
>>> Should the old cue be removed and a new one inserted, or should the
>>> old cue be updated in place?
>>>
>>> Adding a new cue would be convenient, since we could notify JavaScript via
>>> oncuechange. Changing the cue would be nicer if we had an onchange event on
>>> TextTrackCue..
>>
>>> What should happen if the previous cue has already been modified by scripts?
>>
>> I don't have strong feelings about this either way. We should do whatever
>> would be consistent with how we handle script modification in other cases.
>
> There are no existing cases where the parser modified (or even knows
> about) a previous cue, so there's nothing to be consistent with.
> Anyway, I think ignoring the fact that the cue has been modified is
> the only sane thing to do here, I just wanted to check if you had
> other plans.
>
>>> What happens if there are multiple existing cues with the same id?
>>
>> The last cue in the WebVTT file is the only one that matters. All others
>> should be ignored.
>
> What I meant is that a script can add multiple cues with the same id
> while parsing is still happening, even ones that are far in the
> future. I suppose the simplest thing would be to just remove them...
>
>>> Whatever the answers to these questions, it seems like in order to be
>>> efficient, for each parsed cue, one must check if there is already a
>>> cue with the same id. This will either make the parser O(n^2) or
>>> require a hash table, i.e. more memory, and this cost will be payed by
>>> all users of WebVTT, not just live streaming with edits.
>>
>> The memory usage of a hash table would be tiny compared to the size of an
>> actual cue. Some UA's might use hash tables here anyway, to make
>> getCueById() more efficient.
>
> You are correct, it's likely that browsers will eventually optimize
> getCueById and that reusing that would not add any cost.
>
> So, to cut to the chase, I really think that trying to solve the "live
> streaming with edits" case at this point is premature. I'm not saying
> that it doesn't exist, it does, but it seems like an area where more
> experimentation (using JavaScript) would be useful to inform the spec.
> Remember that WebVTT doesn't support live streaming *without* edits
> yet.
>
> Philip
>

Received on Tuesday, 28 January 2014 09:49:19 UTC