Re: Inband styling (was Re: Evidence of 'Wide Review' needed for VTT)

From: Glenn Adams <glenn@skynav.com<mailto:glenn@skynav.com>>
Date: Wednesday, 21 October 2015 18:59
To: "David (Standards) Singer" <singer@apple.com<mailto:singer@apple.com>>
Cc: Philip Jägenstedt <philipj@opera.com<mailto:philipj@opera.com>>, Cyril <cyril.concolato@telecom-paristech.fr<mailto:cyril.concolato@telecom-paristech.fr>>, W3C Text Tracks CG <public-texttracks@w3.org<mailto:public-texttracks@w3.org>>
Subject: Re: Inband styling (was Re: Evidence of 'Wide Review' needed for VTT)
Resent-From: W3C Text Tracks CG <public-texttracks@w3.org<mailto:public-texttracks@w3.org>>
Resent-Date: Wednesday, 21 October 2015 19:00



On Wed, Oct 21, 2015 at 11:53 AM, Glenn Adams <glenn@skynav.com<mailto:glenn@skynav.com>> wrote:


On Wed, Oct 21, 2015 at 7:00 AM, David Singer <singer@apple.com<mailto:singer@apple.com>> wrote:

> On Oct 21, 2015, at 14:36 , Philip Jägenstedt <philipj@opera.com<mailto:philipj@opera.com>> wrote:
>
> On Wed, Oct 21, 2015 at 2:17 PM, David Singer <singer@apple.com<mailto:singer@apple.com>> wrote:
>>
>> Yes, the static transcoding case is easier.  It is, alas, not the only one.
>
> What we are talking about is the conformance requirements of
> standalone WebVTT files and what the WebVTT parser will do if
> encountering style blocks after a cue.

No, I think I must disagree.  Is such a restriction written anywhere (that files cannot be incrementally produced)?  You might argue that the incremental production case isn’t specifically included either, but I think we live in a world with more english than german rules :-)
<https://en.wikipedia.org/wiki/Everything_which_is_not_forbidden_is_allowed#National_traditions>

> In this context, static
> resources really is all that exists, as live captioning with
> <track>+WebVTT [1] hasn't been spec'd. If there are other contexts
> that use the WebVTT syntax and parser in a streaming mode, then that
> would be interesting to know. AFAICT, it would only be a situation
> like that where there could be a problem, and if it's only a
> hypothetical at this point I don't think that should affect how WebVTT
> works in the context of <track>.

No, it’s not hypothetical.  DASH/MP4/VTT relies on this, and it was (and is) seen as a core advantage of VTT over TTML.

And now that we have ISD in TTML2, we no longer have that problem (if one chooses to stream ISD instances, each of which contains only the minimal style and other header data required to process that ISD instance).

And of course you can merely chunk your stream into small TTML documents that cover an ISD time interval, in which case the only advantage of ISD is that it has already flattened the style hierarchy (ISD transmits computed style sets).

The current approach for DASH/MP4/TTML has each "sample" being a complete TTML document that includes content for its time period and any styling etc needed to present it – there's no need to include styling that applies exclusively to other samples. It's fine to put temporal changes that occur within the sample period inside that document, i.e. to represent multiple ISDs. That works fine. From implementation work I've seen it is not an intensive operation to flatten the style hierarchy on the client side, so I'm not convinced that there's any need to move it server-side.

If what people are hoping for here is a 1:1 mapping between sample and text track cue then I'd ask again why that is thought of as being a good thing. I see big advantages in being able to serialise consecutive sets of 'cues' with a temporal granularity that can be defined based on the streaming application's requirements.

I have to admit I'm baffled here – there seem to be requirements or characteristics of the solution that people are want or need that I haven't understood, but since I know that live streaming of TTML in DASH/MP4 works fine I don't know what they are!




David Singer
Manager, Software Standards, Apple Inc.

Received on Thursday, 22 October 2015 08:20:29 UTC