Re: File headers

On Sat, Mar 9, 2013 at 1:33 AM, Silvia Pfeiffer
<silviapfeiffer1@gmail.com>wrote:

> Allowing empty lines in headers will require us to make non-backwards
> compatible changes to WebVTT. Or do you have an idea on how to achieve that?
>

I'll review my most recent proposal below.


> That's fine by me, but other than CSS what multi-line metadata do we need?
> So, should we even bother?
>

I don't think we have other use cases yet, but I'd hate to get locked into
single-line headers for good and then get bit by it down the line; it might
no longer be possible to add it later if we do only a single-line format
now...

I agree with 2, since it's backwards compatible. But we do need to figure
> out how to make the header work. Any ideas welcome!
>

Here's a review of the last proposal I made (updated with later parts of
the discussion, to deal with "-->" without needing to escape it).  Note
that the escape character ("\") and termination character (".") are
arbitrary, and we could use something else (eg. "|" and "##") if we think
it's more convenient or easier to read.

Author: Glenn
Style: |
p {
 font-size: 100px;
}
The following is a blank line:
\
The following line contains only ".":
\.
.

In detail:

1: When parsing a multi-line header, if the first character is a backslash,
remove it.  This is the only escaping mechanism in the format.  It permits
blank lines, lines consisting only of a period, as well as escaping the
escape character itself.  Only a single leading backslash is removed, not
backslashes in the middle of the line, eg. it's exactly this:
if(line.substr(0,1) == "\\") line = line.substr(1).
2. Header keys ("Author") must not start with a digit (in order to make the
next point work).  Other restrictions may make sense (eg. to allow an API
like dataset), but this is the only one we need for this to work.
3. "-->" error handling (step 14) is changed.  Header processing stops if a
line begins with a digit, contains the string "-->", and is not contained
within a multi-line header block.  This means this authoring error will
still recover:

Author: Glenn
00:11.000 --> 00:13.000
Hello

This requires no escaping, because cue timing lines always begin with a
digit, and header keys never do, so we know it can't be a cue timing line:

Source: http://website.com/hello-->world

This also requires no escaping, because the line is within a multi-line
header:

Notes: |
1. These translations are bad <-- Fix them -->
.

This is the case it won't fully recover from, causing the first cue to be
dropped, which I think is fine:

Author: |
Glenn
00:11.000 --> 00:13.000
Hello

There are some other details that we talked about before (how to have a
single-line header of "|"; the particulars of which newlines are part of
the data and which aren't), but I'll wait before going over those so we can
talk about the rest.

-- 
Glenn Maynard

Received on Saturday, 9 March 2013 17:06:08 UTC