Re: meta-data in the VTT file header, a strawman proposal

On Tue, Apr 24, 2012 at 10:10 AM, David Singer <singer@apple.com> wrote:
> On Apr 23, 2012, at 16:00 , Glenn Maynard wrote:
> On Sat, Apr 21, 2012 at 1:10 AM, Silvia Pfeiffer <silviapfeiffer1@gmail.com>
> wrote:
>>
>> Something like:
>>
>> 1.
>> Name-vaue pairs of header metadata are given with a name-string
>> separated from the value by a colon.
>> No control characters or separators are allowed in the name value.
>> No white space is allowed between the name and the colon (?).
>>
>> 2.
>> If the value is a single "|" character, the value is multi-line,
>> starting on the next line and ends with a line that only contains a
>> single dot.
>> The newline just before the dot-line is also not part of the value.
>>
>>
>> A quick-and-dirty ABF could be:
>>
>> metadata-header = field-name ":" field-value
>> field-name = token
>> field-value = ("|" *TEXT CRLF "." CRLF) | (*TEXT without CRLF)
>
>
> (I'm not sure; it looks more or less right, but reading ABFs has always
> given me a headache.)
>
> Some other details:
>
> Presumably, whitespace between the colon and a single-line value would be
> stripped, eg.
>
> Key:     Value
>
> would result in "Key" = "Value".  If you have significant leading whitespace
> in the value you want to preserve, or if you need to encode the string "|"
> itself, then switch to the block format:
>
> Key: |
>     Value
> .
>
> Key: |
> |
> .
>
>
> Yes.  Then all we need to add is
> "In multi-line values, a line that either (a) starts with the escape
> character

There is no escape character. I don't think we need one.


> or (b) is blank (safer, visually blank)

We can't do blank lines or we break the WEBVTT parsing algorithm. I
think we will have to just accept that WebVTT headers can't have blank
lines.


> or (c) consists of the
> termination sequence (a single period) must be escaped by having a "\"
> pre-pended. On receipt, gather the lines up to the final terminator (".")
> and remove all leading "\" characters.

We haven't introduced an escape character at this stage. The only
place where we'd need one is if we really needed a multi-line value
with a "." on a single line. Is this case sufficiently likely to have
to deal with it? Is there a way around it with a UTF-8 char?


> If we want total flexibility, remove the line-break before the "." line, so
> you can end without the line-end character if you want to (you can always
> put it back explicitly with an escaped blank line).

I think that's not so easy to parse and visually see as just "." on a
line by itself. And it's easy to forget it at the end of a line, so
I'd rather just have it there on a line by itself.

Regards,
Silvia.

Received on Tuesday, 24 April 2012 00:47:33 UTC