Re: Inband styling (was Re: Evidence of 'Wide Review' needed for VTT) from Philip Jägenstedt on 2015-10-22 (public-texttracks@w3.org from October 2015)

From: Philip Jägenstedt <philipj@opera.com>
Date: Thu, 22 Oct 2015 13:36:40 +0200
To: Cyril Concolato <cyril.concolato@telecom-paristech.fr>
Cc: "public-texttracks@w3.org" <public-texttracks@w3.org>
Message-ID: <CAMQvoCnQxmOEdGy33cvfTACzdDku8G61kT55rJyyTttxwThWKQ@mail.gmail.com>

On Thu, Oct 22, 2015 at 10:47 AM, Cyril Concolato
<cyril.concolato@telecom-paristech.fr> wrote:
> Le 21/10/2015 15:39, Philip Jägenstedt a écrit :
>>
>> In the DASH/MP4/VTT software stack, is WebVTT the input or the output, and
>> is it a file or a stream? AFAICT, the only issue would be with a WebVTT
>> input stream (using the syntax in the spec, not any other framing) with
>> STYLE blocks at the end, but since streaming standalone WebVTT doesn't exist
>> yet I'm uncertain if that's really what you mean.
>
> These are the good questions. It is currently possible to have a
> never-ending WebVTT file being produced live, delivered over HTTP (e.g.
> using chunked transfer encoding). Such WebVTT 'stream' cannot easily be
> consumed by a browser today because the Streams API is not there yet, but it
> will be available in the future. Other (non-browser) WebVTT implementations
> can already use that today. This might require careful creation of cues to
> insure that each point is a random access, but that's possible today.
> Several services can be done based on that: think of a WebRadio with
> subtitling. Regarding MP4 packagine, an implementation could consume such
> stream and produces MP4 segments on the fly, if needed.
>
> For those implementations, if a new untimed style header would arrive in the
> input WebVTT stream and if such style would be defined to have effects on
> the whole 'file', i.e. including to cues prior in the 'file', then playing
> the live stream versus recording the stream and then playing a file would
> not have the same result. That would be problematic. That's why I think that
> styles should either be in the header (with semantics that they are valid
> for the whole file and without the ability to be in between cues) or as a
> timed block with a well defined time validity (like cues), or as settings of
> a cue. For the last two options, it really looks like WebVTT would become a
> multiplex of two types of timed data (cue and styles), I'm not sure we
> should go in this direction and if a separate style file/stream wouldn't be
> better.

Do you have a pointer to such a never-ending WebVTT file deployed on
the public web? I honestly didn't think they would exist yet.

To be pendantic, the reason that never-ending WebVTT files don't work
in browsers isn't because of the Streams API, but because the media
element's readyState cannot reach HAVE_FUTURE_DATA until the text
tracks are ready:
https://html.spec.whatwg.org/multipage/embedded-content.html#the-text-tracks-are-ready

This is what the spec bug is about, some mechanism to unblock
readyState before text track parsing has finished:
https://www.w3.org/Bugs/Public/show_bug.cgi?id=18029

Anyway, letting the parser discard style blocks after any cues until
we've figured out the live streaming issues is OK with me. However,
let's spell out the implications of keeping this restriction for live
streams: If you don't know all of the style up front, your only
recourse is to add a new text track at the point where new style is
needed. This will involve scripts, at which point handling multiple
WebVTT tracks will compare unfavorably with just using a WebSocket
connection to deliver cues and style using a custom syntax.

Philip

Received on Thursday, 22 October 2015 11:37:09 UTC