Re: Metadata in the VTT file header (bug 15851), use cases (and a need to close this)

On Sep 2, 2012, at 2:31 , Silvia Pfeiffer <silviapfeiffer1@gmail.com> wrote:

> On Fri, Aug 31, 2012 at 5:50 PM, David Singer <singer@apple.com> wrote:
>> 
>> On Aug 31, 2012, at 1:55 , Simon Pieters <simonp@opera.com> wrote:
>> 
>>> 
>>> I think the pipe and the dot look like noise or typos and the backslash escaping is very confusing. Authors are confused already about how things should be escaped in various languages. Let's not make it worse if we can avoid it.
>> 
>> 
>> I don't think anyone is particularly wedded to those characters.  I originally suggested [[ and ]] as the bracketing characters, for example.
>> 
>> Believe it or not, this was designed looking at the obvious case -- inline stylesheets.  We wanted a terminating line that was extremely unlikely in CSS, so the need to escape it, though formally possible, would almost never arise.  Off-hand I can't think that blank lines are ever semantically important in CSS, so it's OK to delete them if you prefer not to escape them, and likewise backslash as a line-start character would be rare.  All this means that though the escaping syntax is 'complete' (we haven't designed it so that we have a problem in future, in that anything *can* be included), it'll rarely be needed for the immediate use-case.  But there are other escape characters that have these characteristics; if taste (or other use cases) suggest other approaches, that's fine by me.
>> 
>> Tucking the style-sheet into the header also makes sense if you see it as 'presentational' rather than semantic.  Just like in days of yore you could present HTML without CSS, the semantic content of VTT should be there even if you don't style it using CSS (and we have enough intrinsic markup to achieve that, IMHO). Existing parsers skip the header; using that they also skip what mostly appear to be invalid cues is more fragile, IMHO.
> 
> 
> I would prefer if we didn't have to escape anything. But I also agree
> that pushing a header into a "broken cue" is rather fragile.

It's fragile in two rather important respects.

1) We now have to escape --> in the middle of lines, rather than whole lines (blank lines, lines that are the terminator, or lines that start with the escape); that's worse (more searching and more error-prone).
2) Much more seriously, it looks as though one might be allowed to write

WebVTT
<header>

<stylesheet>

<cue>

<cue>

<stylesheet>

<cue>

<stylesheet>

and this leave a huge question as to the scope of the second and third stylesheets.  Whole file?  But they come after it started.  From that point on?  Then the huge advantage of random-accessibility that VTT has over every other format is lost.  (You can random access today by using the header and then pre-rolling from the first cue that overlaps the desired start-time.)


Just as Silvia says about WebM, the MP4 encapsulation under way preserves the header as setup information, and then streams the cues.  I'd like to preserve that also.


Note that the proposed syntax was not a breaking change at all when the header was defined to end at the first blank line, and is still not a breaking change for well-formed files when that is true.

I'm fine for defining something with no escapes at all.  The most likely way to do that is to have the syntax where the 'open bracket' declares what the 'close bracket line' is (and the author is responsible for making sure it works). That doesn't get us out of the higher-level 'end with a blank line' part of WebVTT, and I think there are quite some advantages in fitting into that, as it enables existing parsers to ignore the data -- which is just right for optional style sheets.  Note, however, that the syntax was designed so that the escaping would rarely be needed in the obvious case (small, compact style-sheets).


Yes, I realize that there is a difference of design philosophy going on here.  Ian clearly prefers the one-at-a-time design, whereas (for example) in MP4, we try to always ask "what gereal problem is this an example of?".  So when, for example, we needed a way to handle progressive-decoder-refresh in video and pre-roll in audio, we built a structure (which has proved useful for other things later) that allowed placing the data frames of a track into named, parameterized classes, and there is a name for the pre-roll class, whose parameter is the distance.  So when I see language, style, and kind, absolutely I untuit that there may just be more needs of this sort in future, and look for a general design to support it in a way that achieves some sort of forward-compatibility. As long as it's not taken to extremes (a caveat that needs applying to *all* design approaches), I think it's a valid way to proceed that needs no apology or odious comparisons.

David Singer
Multimedia and Software Standards, Apple Inc.

Received on Tuesday, 4 September 2012 22:44:28 UTC