W3C home > Mailing lists > Public > public-texttracks@w3.org > September 2012

Re: Metadata in the VTT file header (bug 15851), use cases (and a need to close this)

From: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
Date: Fri, 28 Sep 2012 11:52:01 +1000
Message-ID: <CAHp8n2mUPYGXpXfW_+GwO0s12E1Wtb_4RvdGHodv_65Md5nR9A@mail.gmail.com>
To: Glenn Maynard <glenn@zewt.org>
Cc: David Singer <singer@apple.com>, Simon Pieters <simonp@opera.com>, public-texttracks <public-texttracks@w3.org>
On Fri, Sep 28, 2012 at 11:42 AM, Glenn Maynard <glenn@zewt.org> wrote:
> On Thu, Sep 27, 2012 at 7:06 PM, Silvia Pfeiffer <silviapfeiffer1@gmail.com>
> wrote:
>> If we finish the header area of a WebVTT file not on a blank line, but
>> on the first valid cue, then we don't need to escape anything really,
>> because it is quite unlikely to have a "time --> time" pattern in
>> anything but a cue. We might want to escape "-->" if necessary, but
>> that's all.
> FWIW, it would be nicer to instead change the "-->" error recovery rule for
> the header loop to something more specific ("--> that isn't within a
> multiline header, or on a "Key: .*" single-line header), and to use a unique
> line (eg. ".") to end the header.  That avoids needing mid-line escapes (eg.
> --&gt;), so only a single escape mechanism is needed.
> This can work if the parser knows about the format of headers, so the
> following (and variations) is parsable:
>> Font: http://fonts.com/my-->font.ttf
>> Style:
>> .foo { bar: "a --> b"; };
>> .foo2 { bar2 };
>> .
>> 00:01.000 --> 00:02.000
>> text
> If the parser understands the format of headers, it can figure out that
> we're not, in fact, breaking out of the header region and into cues on that
> blank line.  It can understand that the first two "-->" are probably not
> mis-authored cues, since it's in a header and it's in the middle of a header
> block.  It can also detect that the last --> at the bottom *is* a
> mis-authored cue (that is, the blank line before the first cue is missing),
> since it's not within a header block.
> This maintains the error-recovery for the most common errors (forgetting the
> blank line), and doesn't require escaping anything except a lone "." (and
> the quote itself).

Then we have to be careful that a multiline value can't have a
"name"-colon inside it, since this implies the start of a new metadata
header field.

> However (as we've talked about before) this would require
> backwards-incompatible changes to the parser.  Current parsers would drop
> out of the header loop at the first "-->", and if those wasn't there they'd
> drop out at the blank line.  That's going to apply to anything that doesn't
> require escaping blank lines and/or -->.  (That's the reason we went down
> the other escaping path in the first place.)

It is not going to break anything yet if we make that change right
now, even though the definition of the header changes. So, now is a
good time to do this.

I would prefer to avoid a dot on a single line as the end marker - it
is too easily missed.

Received on Friday, 28 September 2012 01:52:48 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:27:20 UTC