W3C home > Mailing lists > Public > public-texttracks@w3.org > September 2012

Re: Metadata in the VTT file header (bug 15851), use cases (and a need to close this)

From: Glenn Maynard <glenn@zewt.org>
Date: Thu, 27 Sep 2012 20:42:36 -0500
Message-ID: <CABirCh8_cHywhCmW_erbuMZS8o+9DYCshebC2kbsqe7Ci+Du1g@mail.gmail.com>
To: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
Cc: David Singer <singer@apple.com>, Simon Pieters <simonp@opera.com>, public-texttracks <public-texttracks@w3.org>
On Thu, Sep 27, 2012 at 7:06 PM, Silvia Pfeiffer
<silviapfeiffer1@gmail.com>wrote:

> If we finish the header area of a WebVTT file not on a blank line, but
> on the first valid cue, then we don't need to escape anything really,
> because it is quite unlikely to have a "time --> time" pattern in
> anything but a cue. We might want to escape "-->" if necessary, but
> that's all.
>

FWIW, it would be nicer to instead change the "-->" error recovery rule for
the header loop to something more specific ("--> that isn't within a
multiline header, or on a "Key: .*" single-line header), and to use a
unique line (eg. ".") to end the header.  That avoids needing mid-line
escapes (eg. --&gt;), so only a single escape mechanism is needed.

This can work if the parser knows about the format of headers, so the
following (and variations) is parsable:

> Font: http://fonts.com/my-->font.ttf
> Style:
> .foo { bar: "a --> b"; };
>
> .foo2 { bar2 };
> .
> 00:01.000 --> 00:02.000
> text

If the parser understands the format of headers, it can figure out that
we're not, in fact, breaking out of the header region and into cues on that
blank line.  It can understand that the first two "-->" are probably not
mis-authored cues, since it's in a header and it's in the middle of a
header block.  It can also detect that the last --> at the bottom *is* a
mis-authored cue (that is, the blank line before the first cue is missing),
since it's not within a header block.

This maintains the error-recovery for the most common errors (forgetting
the blank line), and doesn't require escaping anything except a lone "."
(and the quote itself).

However (as we've talked about before) this would require
backwards-incompatible changes to the parser.  Current parsers would drop
out of the header loop at the first "-->", and if those wasn't there they'd
drop out at the blank line.  That's going to apply to anything that doesn't
require escaping blank lines and/or -->.  (That's the reason we went down
the other escaping path in the first place.)

-- 
Glenn Maynard
Received on Friday, 28 September 2012 01:43:04 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Friday, 28 September 2012 01:43:05 GMT