Re: meta-data in the VTT file header, a strawman proposal

On Sat, Apr 21, 2012 at 2:39 PM, Glenn Maynard <glenn@zewt.org> wrote:
> On Fri, Apr 20, 2012 at 10:59 PM, Silvia Pfeiffer
> <silviapfeiffer1@gmail.com> wrote:
>>
>> > On Fri, Apr 20, 2012 at 12:01 AM, David Singer <singer@apple.com> wrote:
>> >> Yes, I think we want some readability.  And maybe assuming all values
>> >> could be on one line is bad.
>> >
>> > (I agree with the first, but the second isn't a problem: any JSON
>> > document
>> > can be stored on one line.)
>>
>> I believe David wanted to make sure valued don't *have* to be on one
>> line, which your proposal seemed to suggest?
>
>
> I mean: it's perfectly safe to assume that all values *can* be stored on a
> single line with JSON; the second (assuming all values *can* be on one line)
> is safe.  Newlines within the string are encoded with \n (per JSON), and
> there's no inherent limit to how long a line can be.
>
> But it's the first that's the problem--it's ugly and hard to read with a
> text editor, even though it would work.
>
>> > Key: value
>> >  value
>> >   value2
>> >
>> > would decode to "value\nvalue\n value".
>>
>> Incidentally, multi-line in HTTP headers also require white space at the
>> start.
>
>
> Right, but it's one or more of any whitespace, so you can't tell how much
> whitespace was there to begin with.  If it's exactly one space (like
> patches), it doesn't have that problem.
>
>> I am not sure we can, though, without changing the parsing of WebVTT.
>
>
> We could use a simple escape mechanism: if the first character on a header
> value line begins with a backslash, remove it.  So, you get the following:
>
> Key: |
> line 1
> \.
> \
> \\
> .
>
> In effect, "\." at the start of a line represents a period, "\" on a line by
> itself represents a blank line, and '\\" at the start of a line is a single
> backslash (escaping the escape character), but defined as a single trivial
> rule instead of a list of escape sequences.
>
> This means--unless I'm missing a case--that *any* valid block of UTF-8 text
> will round-trip.  We don't strictly need that now (the only use case so far
> for multiline comments is CSS), but it seems like a useful property to have
> going forward.
>
> Also, these are infrequent enough that they wouldn't uglify source very
> much.
>
> (One tangental detail: the final newline before the terminating "." line
> should not be included in the resulting header data, or else it would be
> impossible to encode a string that doesn't end with a newline.)


OK, it seems we've just created ourselves a new name-value format
spec. As much as I wanted to avoid this (sigh), I think this is the
simplest we've had this far.

Something like:

1.
Name-vaue pairs of header metadata are given with a name-string
separated from the value by a colon.
No control characters or separators are allowed in the name value.
No white space is allowed between the name and the colon (?).

2.
If the value is a single "|" character, the value is multi-line,
starting on the next line and ends with a line that only contains a
single dot.
The newline just before the dot-line is also not part of the value.


A quick-and-dirty ABF could be:

metadata-header = field-name ":" field-value
field-name = token
field-value = ("|" *TEXT CRLF "." CRLF) | (*TEXT without CRLF)


Sound roughly right?

Cheers,
Silvia.

Received on Saturday, 21 April 2012 06:11:45 UTC