Re: File headers from Glenn Maynard on 2013-03-11 (public-texttracks@w3.org from March 2013)

From: Glenn Maynard <glenn@zewt.org>
Date: Sun, 10 Mar 2013 20:01:58 -0500
To: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
Cc: public-texttracks@w3.org
Message-ID: <CABirCh9=JwRkDCDujnnL7j5Qm70yd5N4_eHrGU2C6bEb=qpKBg@mail.gmail.com>
On Sun, Mar 10, 2013 at 6:21 PM, Silvia Pfeiffer
<silviapfeiffer1@gmail.com>wrote:

> If you have to escape an empty line with a "\", then you might as well not
> do inline styles, since you have to author them differently to a CSS file.
>

The intent is to allow a WebVTT editor, or serialization and parsing
library, to represent blank lines, so the header format doesn't have a
weird exception of not being able to represent something as basic as a
blank line.  I don't expect people hand-authoring WebVTT files to use this.


> 3. "-->" error handling (step 14) is changed.  Header processing stops if
>> a line begins with a digit, contains the string "-->", and is not contained
>> within a multi-line header block.  This means this authoring error will
>> still recover:
>>
>> Author: Glenn
>> 00:11.000 --> 00:13.000
>> Hello
>>
>
> Wouldn't this end up being a cue in the current parser? So, this is a
> non-backwards compatible change?
>

By recover, I mean that it would become a cue, just as it does today,
because the "digit ... -->" string doesn't lie in a multi-line header.

This is the case it won't fully recover from, causing the first cue to be
>> dropped, which I think is fine:
>>
>> Author: |
>> Glenn
>> 00:11.000 --> 00:13.000
>> Hello
>>
>
> If there are no blank lines at all before this cue, it will be dropped
> currently, too.
>

No, the current parser will recover this cue.  See step 14 and the "already
collected line" flag.  In this approach, the parser would now drop the cue
in the case of this authoring error.

This is the case I was talking about to begin with: this approach only
works if we can change the parser.

But, be careful to distinguish the backwards-compatibility we're talking
about.  There's compatibility with existing content, which is usually an
overriding concern on the web, and there's compatibility with existing
implementations, which is what we're talking about here.  It's also an
important consideration, but it should be taken in context with what the
incompatibility and failure mode is.

In this case, the difference is that older implementations will recover the
cue when this error is made, and the new parser will drop the cue on the
floor (actually, it'll get subsumed into the multi-line header).  But, the
difference can only happen when three things intersect: when using
multi-line headers, the author forgets to close the multi-line header, and
the author forgets the blank line after the missing header close.  To make
it even less likely, anybody who's writing multi-line headers is probably
using an implementation that supports them, where the error is visible.



> I'd like to stay backwards compatible with the current spec, since it's
> already implemented in 3 browsers. So, I think we can make multi-line
> headers work, but not blank lines. Since --> is currently only valid
> between two complete time specs and after a blank line, I think we can
> safely adjust the parsing of it to be allowed in headers as long as we
> don't introduce blank lines.
>

This sounds backwards.  Blank lines can be implemented with complete
backwards-compatibility with the escaping mechanism I described.  It's
impossible to include "-->" in headers without either a parser change, or
by escaping it directly (eg. "--&gt;").  (It does seem a bit too weird to
not be able to represent "-->" in headers at all, and I'm trying to avoid
the ugliness of needing two different forms of escaping...)

Another approach is to use &gt;-style escaping only.  In order to allow
blank lines, add an "&nl;" escape.  That is,

Style: |
Line 1 --&gt;
Line 2&nl;Line 3
Line 4, the next line contains only a period:&nl;.
Line 6, the next line is blank:&nl;
Line 8
.

I like a few things about this.  It's consistent with the escaping used in
cues (which are also HTML/XML-like), and it only uses an escaping mechanism
that's already well-understood by Web authors, instead of having to make up
something new.  It means needing to HTML-escape CSS, though ampersands are
rare in CSS, so this doesn't seem too annoying.

-- 
Glenn Maynard
Received on Monday, 11 March 2013 01:02:26 UTC