Re: File headers from Silvia Pfeiffer on 2013-03-11 (public-texttracks@w3.org from March 2013)

From: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
Date: Mon, 11 Mar 2013 12:34:29 +1100
To: Glenn Maynard <glenn@zewt.org>
Cc: public-texttracks@w3.org
Message-ID: <CAHp8n2nEDg7oKqx+75Cq_9SfQkZzBRp58GO4jdo-N1OF75z2=A@mail.gmail.com>
On Mon, Mar 11, 2013 at 12:01 PM, Glenn Maynard <glenn@zewt.org> wrote:

> On Sun, Mar 10, 2013 at 6:21 PM, Silvia Pfeiffer <
> silviapfeiffer1@gmail.com> wrote:
>
>> If you have to escape an empty line with a "\", then you might as well
>> not do inline styles, since you have to author them differently to a CSS
>> file.
>>
>
> The intent is to allow a WebVTT editor, or serialization and parsing
> library, to represent blank lines, so the header format doesn't have a
> weird exception of not being able to represent something as basic as a
> blank line.  I don't expect people hand-authoring WebVTT files to use this.
>


We can't exclude that possibility. I'd expect a lot of cut-and-paste
authoring, which would totally break for CSS.



>
>
>> 3. "-->" error handling (step 14) is changed.  Header processing stops if
>>> a line begins with a digit, contains the string "-->", and is not contained
>>> within a multi-line header block.  This means this authoring error will
>>> still recover:
>>>
>>> Author: Glenn
>>> 00:11.000 --> 00:13.000
>>> Hello
>>>
>>
>> Wouldn't this end up being a cue in the current parser? So, this is a
>> non-backwards compatible change?
>>
>
> By recover, I mean that it would become a cue, just as it does today,
> because the "digit ... -->" string doesn't lie in a multi-line header.
>


I'd prefer if we could define cues to require a blank line in front of
them. In this case, it would become a cue only if "Autor: Glenn" is
accidentally detached from the rest of the header and turns into the
identifier of the cue.



> This is the case it won't fully recover from, causing the first cue to be
>>> dropped, which I think is fine:
>>>
>>> Author: |
>>> Glenn
>>> 00:11.000 --> 00:13.000
>>> Hello
>>>
>>
>> If there are no blank lines at all before this cue, it will be dropped
>> currently, too.
>>
>
> No, the current parser will recover this cue.  See step 14 and the
> "already collected line" flag.  In this approach, the parser would now drop
> the cue in the case of this authoring error.
>

Did you mean s/now/not/?

I f there is not a single blank line before the cue in the header, then it
should be dropped. I agree that step 14 is a problem. I think step 14
should only be executed after an empty line has been parsed. That would
solve the whole need for escaping --> . It does cause some files to fail
that forgot the blank line, but I think that's a fair enough breakage.



> This is the case I was talking about to begin with: this approach only
> works if we can change the parser.
>
> But, be careful to distinguish the backwards-compatibility we're talking
> about.  There's compatibility with existing content, which is usually an
> overriding concern on the web, and there's compatibility with existing
> implementations, which is what we're talking about here.  It's also an
> important consideration, but it should be taken in context with what the
> incompatibility and failure mode is.
>

Agreed. I'm concerned with both of these as backwards compatible but only
for implementations that followed the current spec (apart from step 14 ;-).


 In this case, the difference is that older implementations will recover
> the cue when this error is made, and the new parser will drop the cue on
> the floor (actually, it'll get subsumed into the multi-line header).  But,
> the difference can only happen when three things intersect: when using
> multi-line headers, the author forgets to close the multi-line header, and
> the author forgets the blank line after the missing header close.  To make
> it even less likely, anybody who's writing multi-line headers is probably
> using an implementation that supports them, where the error is visible.
>

I think we could live with that. It's a very unlikely error scenario.



>
>> I'd like to stay backwards compatible with the current spec, since it's
>> already implemented in 3 browsers. So, I think we can make multi-line
>> headers work, but not blank lines. Since --> is currently only valid
>> between two complete time specs and after a blank line, I think we can
>> safely adjust the parsing of it to be allowed in headers as long as we
>> don't introduce blank lines.
>>
>
> This sounds backwards.  Blank lines can be implemented with complete
> backwards-compatibility with the escaping mechanism I described.
>

Except that they are not blank lines, but escaped blank lines. That's what
I'm objecting to.



>  It's impossible to include "-->" in headers without either a parser
> change, or by escaping it directly (eg. "--&gt;").  (It does seem a bit too
> weird to not be able to represent "-->" in headers at all, and I'm trying
> to avoid the ugliness of needing two different forms of escaping...)
>

Right, that's because of step 14. If we change step 14 as proposed, headers
can include --> without a need for escaping.


>
> Another approach is to use &gt;-style escaping only.  In order to allow
> blank lines, add an "&nl;" escape.  That is,
>
> Style: |
> Line 1 --&gt;
> Line 2&nl;Line 3
> Line 4, the next line contains only a period:&nl;.
> Line 6, the next line is blank:&nl;
> Line 8
> .
>
> I like a few things about this.  It's consistent with the escaping used in
> cues (which are also HTML/XML-like), and it only uses an escaping mechanism
> that's already well-understood by Web authors, instead of having to make up
> something new.  It means needing to HTML-escape CSS, though ampersands are
> rare in CSS, so this doesn't seem too annoying.
>

The more I think about CSS, the more I'd prefer to force it to be in an
external stylesheet.

Silvia.
Received on Monday, 11 March 2013 01:35:18 UTC