Re: meta-data in the VTT file header, a strawman proposal

On Fri, Apr 20, 2012 at 3:17 PM, David Singer <singer@apple.com> wrote:
>
> On Apr 20, 2012, at 13:02 , Silvia Pfeiffer wrote:
>
>> On Fri, Apr 20, 2012 at 1:45 PM, David Singer <singer@apple.com> wrote:
>>>
>>> On Apr 20, 2012, at 11:04 , Silvia Pfeiffer wrote:
>>>
>>>> On Thu, Apr 19, 2012 at 6:29 PM, David Singer <singer@apple.com> wrote:
>>>>>
>>>>> On Apr 11, 2012, at 10:10 , Silvia Pfeiffer wrote:
>>>>>
>>>>>>> RFC 822 generally considers values as "one long line that can be folded if it's too long", and I am not sure that's right for us.   I think that line-breaks can be significant in some of the values we cant, no?  (Such as CSS).
>>>>>>
>>>>>> Do we need empty lines?
>>>>>
>>>>> it makes style-sheets readable, and aren't line-breaks significant in style-sheets?
>>>>
>>>> No, line breaks are not significant in style sheets. You can specify a
>>>> CSS file in a single line. In fact, all web page compression programs
>>>> do this. Semicolons are significant though.
>>>>
>>>> As for line breaks in RFC822 header syntax: as long as you put a blank
>>>> or other whitespace character at the beginning of the line, RFC822
>>>> header syntax allows it to be a continuation line and folds it in. We
>>>> would just need to make sure WebVTT doesn't recognize that as the end
>>>> of the header section and starts trying to parse cues.
>>>>
>>>>
>>>>> And we don't *want* actual empty lines, as simple parsers will think the cue-text is coming next.
>>>>
>>>> OK, then RFC822 header syntax seems adequate.
>>>
>>>
>>> Yes, but the transformation you have to exercise to make something into the format is quite significant;  and it's non-reversable.
>>
>> You mean introducing blanks at the beginning of lines?
>>
>>
>>>  What I suggested was a need to escape blank lines and one character sequence (]]).  For 822, you have to remove all blank lines and indent all other lines.
>>
>> You won't have to remove blank lines, just add a blank character at
>> the start of them.
>
> ouch. that's fragile; lines that appear to be empty but actually consist of a single white-space. We're laying ourselves open to endless misunderstanding if blank lines 'appear' OK but actually need to have an invisible space character in them.
>
>>
>>
>>>  Basically, we're gambling that those changes are not significant for anything we want to embed. We think it's OK for CSS, but our use case is very different from 822.  In mail headers, the design of the headers themselves is controlled.  We, on the other hand, don't really want to restrict what *could* be an attribute-value.
>>>
>>> So, 822 may look OK now, but it represents non-reversable transformation, changes every line of a multi-line attribute value by requiring all text be indented, and assumes all multi-line values are logically a single line with insignificant breaks.  That worries me.
>>
>> Well, I didn't mean to imply the part of RFC822 that makes multi-line
>> values into a single line. So, maybe we can agree on RFC822 with
>> caveats, e.g. that newlines are retained, but that they just mean that
>> all the values belong to the same header field.
>>
>>
>> I suppose the options we have for solving multi-line metadata fields are:
>>
>> 1. specify them only one one line with a explicit separator like the
>> "\n" character that Glenn suggested
>>
>> 2. specify them over multiple lines where a blank (or similar
>> white-space character) at the start signifies a continuation of the
>> value
>>
>> 3. specify an special character that starts and ends a multi-line
>> field; this requires to also specify a way to escape that character in
>> the value (e.g. "[[" is often used in wiki markup, so I wouldn't
>> regard it as uncommon - we'd need to escape it)
>
> Note that ]] on a line by itself is pretty rare, even in a wiki, no?  That's the only case that needs escaping;  ]] on a line by itself, and blank (and for safety, apparently blank) lines.  Otherwise, the text goes through unmodified.  So lines containing ]] are fine.
>
> Since we're talking about existing implementations, this is almost exactly what SMTP uses for the body of a mail message, though there the terminator is a period (".") on a line by itself, and the blank line problem does not arise.
>
>> In my opinion the changes required to introduce white-space characters
>> at the beginning of a line are less intrusive than having to introduce
>> an escaping mechanism.
>
>
> Oh, so far I think the reverse.  822 requires a method to indicate true hard line breaks, for attribute values that need them, and that's an extension to 822.  For any common body of text with a number of lines that are empty or start with a non-blank character, all those lines need modifying, whereas an SMTP-like terminator syntax only needs to modify lines that look like a terminator and blank lines, which is a lot less in common cases.  A terminator-and-escape syntax ensures exact recovery of the original input, whereas with 822 one cannot tell the difference between lines that originally started with a blank, and lines that had one added for 822 compatibility.


OK, Glenn and you've made me see the problems with 822 - indeed we
would need some changes.

But the "[[" and "]]" proposal is soo ugly and not used anywhere else.
I would be surprised if we couldn't find a solution that is being used
elsewhere.

Silvia.

Received on Friday, 20 April 2012 05:23:20 UTC