Re: Metadata in the VTT file header (bug 15851), use cases (and a need to close this)

On Fri, Aug 31, 2012 at 5:50 PM, David Singer <singer@apple.com> wrote:
>
> On Aug 31, 2012, at 1:55 , Simon Pieters <simonp@opera.com> wrote:
>
>>
>> I think the pipe and the dot look like noise or typos and the backslash escaping is very confusing. Authors are confused already about how things should be escaped in various languages. Let's not make it worse if we can avoid it.
>
>
> I don't think anyone is particularly wedded to those characters.  I originally suggested [[ and ]] as the bracketing characters, for example.
>
> Believe it or not, this was designed looking at the obvious case -- inline stylesheets.  We wanted a terminating line that was extremely unlikely in CSS, so the need to escape it, though formally possible, would almost never arise.  Off-hand I can't think that blank lines are ever semantically important in CSS, so it's OK to delete them if you prefer not to escape them, and likewise backslash as a line-start character would be rare.  All this means that though the escaping syntax is 'complete' (we haven't designed it so that we have a problem in future, in that anything *can* be included), it'll rarely be needed for the immediate use-case.  But there are other escape characters that have these characteristics; if taste (or other use cases) suggest other approaches, that's fine by me.
>
> Tucking the style-sheet into the header also makes sense if you see it as 'presentational' rather than semantic.  Just like in days of yore you could present HTML without CSS, the semantic content of VTT should be there even if you don't style it using CSS (and we have enough intrinsic markup to achieve that, IMHO). Existing parsers skip the header; using that they also skip what mostly appear to be invalid cues is more fragile, IMHO.


I would prefer if we didn't have to escape anything. But I also agree
that pushing a header into a "broken cue" is rather fragile. I am in
particular concerned that it might end up as a cue inside
encapsulations that follow the parsing algorithm of WebVTT. E.g. say
you're parsing a WebVTT file according to its structure to encapsulate
them into WebM, then you would end up identifying the header until the
first empty line, then identifying the cues. And as you identify a cue
that you cannot give a time segment to (because there is none), you
drop the cue on the floor. This means that a WebM encapsulation would
always drop an inline style sheet.

If we could extend the WebVTT parser to have, say

WEBVTT
END

as the header and ignoring everything betwen WEBVTT and END, then we
could do whatever in the header, including having blank lines. It's
not backwards compatible with the blank line mechanism, but it might
not be too late to introduce something like this.

Then we could have multi-line header fields with blank lines like this:

WEBVTT
language: fr
kind: subtitles

STYLE
#foo { color:green }
i { font-family:serif }

END

foo
00:00:00.000 --> 00:00:05.000
testing <i>testing</i>

Silvia.

Received on Sunday, 2 September 2012 09:32:01 UTC