Re: Inband styling (was Re: Evidence of 'Wide Review' needed for VTT) from Silvia Pfeiffer on 2015-10-23 (public-texttracks@w3.org from October 2015)

From: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
Date: Fri, 23 Oct 2015 21:07:20 +1100
To: Cyril Concolato <cyril.concolato@telecom-paristech.fr>
Cc: public-texttracks@w3.org, Nigel Megitt <nigel.megitt@bbc.co.uk>
Message-ID: <CAHp8n2=z_rOXWs3f4XyX2H=JJjA+azhsP7wWgAsH=cB9jqemVA@mail.gmail.com>
On 23 Oct 2015 7:50 pm, "Cyril Concolato" <
cyril.concolato@telecom-paristech.fr> wrote:
>
> Le 22/10/2015 23:33, Silvia Pfeiffer a écrit :
>>
>>
>> Just so we are clear: this already exists and works well for WebVTT.
>>
> I don't know what you mean by 'exists'. There is a pull request on-going
but it is not (to my knowledge) implemented in any browser (is it?) or
authoring tool, which means it can be removed if never implemented.

With "this" I was referring to the creation of
A sequences of indolent documents like requested by Nigel. I think you
agree that Apple's use of WebVTT with HLS works that way. That's all I was
referring to m

>>
>> It has limitations though such as having to repeat styles across the
segments. Which is why we are discussing alternative approaches. HTH.
>>
> And an alternative approach might in the future replace or complement the
current approach if the group feels like the limitations are too strong, no?

Yes. I was merely replying to Nigel to explain why this discussion is
happening.

Cheers,
Silvia.

> Cyril
>>
>>
>> Cheers,
>> Silvia.
>>
>> Best Regards,
>> Silvia.
>>
>> On 23 Oct 2015 12:52 am, "Nigel Megitt" <nigel.megitt@bbc.co.uk <mailto:
nigel.megitt@bbc.co.uk>> wrote:
>>
>>     It would also be possible to take the same approach with VTT as we
>>     have
>>     taken with TTML, which is that you have a sequence of independent
>>     documents each of which contains the styling etc needed to display
>>     itself,
>>     for whatever time period applies. Then you have something
>>     deliverable that
>>     will work, and you can separate out the problem of creating a
>>     single long
>>     document that contains "all the previous documents' content" into a
>>     different processing task. If you go down the route of timed
>>     styles then
>>     you're almost at that point anyway.
>>
>>     Nigel
>>
>>
>>     On 22/10/2015 14:40, "Cyril Concolato"
>>     <cyril.concolato@telecom-paristech.fr
>>     <mailto:cyril.concolato@telecom-paristech.fr>> wrote:
>>
>>     >Le 22/10/2015 13:36, Philip Jägenstedt a écrit :
>>     >> On Thu, Oct 22, 2015 at 10:47 AM, Cyril Concolato
>>     >> <cyril.concolato@telecom-paristech.fr
>>     <mailto:cyril.concolato@telecom-paristech.fr>> wrote:
>>     >>> Le 21/10/2015 15:39, Philip Jägenstedt a écrit :
>>     >>>> In the DASH/MP4/VTT software stack, is WebVTT the input or the
>>     >>>>output, and
>>     >>>> is it a file or a stream? AFAICT, the only issue would be with a
>>     >>>>WebVTT
>>     >>>> input stream (using the syntax in the spec, not any other
>>     framing)
>>     >>>>with
>>     >>>> STYLE blocks at the end, but since streaming standalone WebVTT
>>     >>>>doesn't exist
>>     >>>> yet I'm uncertain if that's really what you mean.
>>     >>> These are the good questions. It is currently possible to have a
>>     >>> never-ending WebVTT file being produced live, delivered over
>>     HTTP (e.g.
>>     >>> using chunked transfer encoding). Such WebVTT 'stream' cannot
>>     easily be
>>     >>> consumed by a browser today because the Streams API is not
>>     there yet,
>>     >>>but it
>>     >>> will be available in the future. Other (non-browser) WebVTT
>>     >>>implementations
>>     >>> can already use that today. This might require careful creation
of
>>     >>>cues to
>>     >>> insure that each point is a random access, but that's possible
>>     today.
>>     >>> Several services can be done based on that: think of a
>>     WebRadio with
>>     >>> subtitling. Regarding MP4 packagine, an implementation could
>>     consume
>>     >>>such
>>     >>> stream and produces MP4 segments on the fly, if needed.
>>     >>>
>>     >>> For those implementations, if a new untimed style header would
>>     arrive
>>     >>>in the
>>     >>> input WebVTT stream and if such style would be defined to have
>>     effects
>>     >>>on
>>     >>> the whole 'file', i.e. including to cues prior in the 'file',
then
>>     >>>playing
>>     >>> the live stream versus recording the stream and then playing a
>>     file
>>     >>>would
>>     >>> not have the same result. That would be problematic. That's why I
>>     >>>think that
>>     >>> styles should either be in the header (with semantics that
>>     they are
>>     >>>valid
>>     >>> for the whole file and without the ability to be in between
>>     cues) or
>>     >>>as a
>>     >>> timed block with a well defined time validity (like cues), or as
>>     >>>settings of
>>     >>> a cue. For the last two options, it really looks like WebVTT
would
>>     >>>become a
>>     >>> multiplex of two types of timed data (cue and styles), I'm not
>>     sure we
>>     >>> should go in this direction and if a separate style file/stream
>>     >>>wouldn't be
>>     >>> better.
>>     >> Do you have a pointer to such a never-ending WebVTT file
>>     deployed on
>>     >> the public web?
>>     >No I don't, but that does not mean that it does not exist nor that
we
>>     >should break such scenario.
>>     >> I honestly didn't think they would exist yet.
>>     >>
>>     >> To be pendantic, the reason that never-ending WebVTT files
>>     don't work
>>     >> in browsers isn't because of the Streams API, but because the
media
>>     >> element's readyState cannot reach HAVE_FUTURE_DATA until the text
>>     >> tracks are ready:
>>     >>
>>     >>
https://html.spec.whatwg.org/multipage/embedded-content.html#the-text-tra
>>     >>cks-are-ready
>>     >>
>>     >> This is what the spec bug is about, some mechanism to unblock
>>     >> readyState before text track parsing has finished:
>>     >> https://www.w3.org/Bugs/Public/show_bug.cgi?id=18029
>>     >Sorry, I wasn't clear. I know about that bug. I was already assuming
>>     >that a web app would fetch the WebVTT (using XHR or fetch,
retrieving
>>     >the text content as a stream), parse it and produce the cues in
>>     JS, not
>>     >at all using the native browser support, because of that exact bug.
>>     >>
>>     >> Anyway, letting the parser discard style blocks after any cues
>>     until
>>     >> we've figured out the live streaming issues is OK with me.
However,
>>     >> let's spell out the implications of keeping this restriction
>>     for live
>>     >> streams:
>>     >I agree that it's the right approach. We should be aware of the
>>     >limitations of such approach.
>>     >> If you don't know all of the style up front,
>>     >I agree that "if you don't know all of the style up front" you have
a
>>     >problem to solve. Nigel already pointed that out, as being useful in
>>     >broadcast where you don't necessarily know in advance all your
>>     styles.
>>     >To me, there are 2 main approaches: using timed styles or refreshing
>>     >untimed styles.
>>     >
>>     >By timed styles, I imagine something like:
>>     >
>>     >00:01:00.000 --> 00:02:00.000 type:style
>>     >.myRedClass {
>>     >    color: red;
>>     >}
>>     >.myGreenClass {
>>     >    color: green;
>>     >}
>>     >
>>     >00:01:00.000 --> 00:01:30.000
>>     ><v.myGreenClass>Some green text
>>     >
>>     >00:01:20.000 --> 00:02:00.000
>>     ><v.myRedClass>Some red text
>>     >
>>     >A cue of with a 'type' settings whose value is 'style' carries style
>>     >content not text content. This has the advantage of giving precise
>>     >timing for the styles, and we can force styles to appear in start
>>     time
>>     >order (like cues) and before a cue that has a similar start time.
>>     There
>>     >are probably problems with the syntax (blank lines in CSS, I did not
>>     >follow that part of the discussion). Also, if you want to have
>>     seekable
>>     >streams you probably would have to split cues to remove overlap
>>     (nothing
>>     >different from normal cues).
>>     >
>>     >Alternatively, I could also imagine something simpler like:
>>     >00:01:00.000 --> 00:01:30.000 style:color:green;
>>     >Some green text
>>     >
>>     >00:01:20.000 --> 00:02:00.000 style:color:red;
>>     >Some red text
>>     >
>>     >Maybe this could modified to import styles instead of inlining
>>     them, I
>>     >didn't think about that. Also, as I pointed out in my previous
email,
>>     >such VTT file starts to become a multiplex with styles and
>>     content. It
>>     >may be more appropriate to define a Style stream (maybe using the
>>     WebVTT
>>     >syntax) and to link the style stream with the content stream, either
>>     >from the WebVTT content file or from an additional <track> element.
>>     >> your only
>>     >> recourse is to add a new text track at the point where new style
is
>>     >> needed.
>>     >Without defining timed styles (as above), adding a new text track
>>     is an
>>     >option, but not the only one, you can use one text track and fill it
>>     >with cues coming from different WebVTT files. In the HLS
>>     approach, every
>>     >WebVTT segment would (re-)define its styles. That does not mean
>>     you have
>>     >to maintain multiple tracks.
>>     >> This will involve scripts, at which point handling multiple
>>     >> WebVTT tracks will compare unfavorably with just using a WebSocket
>>     >> connection to deliver cues and style using a custom syntax.
>>     >Maybe in some cases the WebSocket approach can be useful, but
>>     there are
>>     >other issues as well like caching.
>>     >
>>     >--
>>     >Cyril Concolato
>>     >Multimedia Group / Telecom ParisTech
>>     >http://concolato.wp.mines-telecom.fr/
>>     >@cconcolato
>>     >
>>     >
>>
>
>
> --
> Cyril Concolato
> Multimedia Group / Telecom ParisTech
> http://concolato.wp.mines-telecom.fr/
> @cconcolato
>
Received on Friday, 23 October 2015 10:07:50 UTC