Re: meta-data in the VTT file header, a strawman proposal

On Fri, Apr 20, 2012 at 2:09 PM, Glenn Maynard <glenn@zewt.org> wrote:
> On Thu, Apr 19, 2012 at 8:51 PM, Silvia Pfeiffer <silviapfeiffer1@gmail.com>
> wrote:
>>
>> The difference between JSON and the RFC822 *header* specification is
>> that JSON requires quotation marks around strings and does not allow
>> newlines in values.
>
>
> There's a good bit more than that.  JSON is a much simpler specification,
> despite having a broader vocabulary (though RFC-822-style headers could
> certainly be specified much more simply than RFC-822 itself does it, of
> course).
>
>> Thus, IMO, RFC822 headers are actually an improvement over JSON.
>> Similarly to JSON, everyone has a RFC822 header parser available. All
>> values are inherently strings, but converted to their proper type by
>> interpretation of the name.
>
>
> (I'm not at all, wedded to JSON and I'm not even convinced it's the right
> idea myself, but I think it's worthwhile to play out the idea.)
>
> JSON parsers are far more ubiquitous than RFC-822 parsers.  You don't have
> one in JavaScript, but you really do always have a JSON parser.  All you'd
> need to do is read a line, split on the colon, and feed the right-hand side
> to the JSON parser.
>
> Existing RFC-822 parsers may implement unwanted features, like RFC-2047.
> Python's "email" module supports things like "message/delivery-status",
> which may expose behavior not wanted by WebVTT.  You'll never want to use a
> stock parser; you'll need to implement your own, and WebVTT will want to
> specify its own minimal subset of RFC-822 (it definitely couldn't just
> reference RFC-822 and say "do what that says").

Yes, I think you're right there. I think we might end up with
basically the JSON spec, but an addition of how to deal with
multi-line.


>  JSON doesn't have this
> problem: you just read a line, split on the colon and feed the right-hand
> side to a standard JSON parser.  Standard parsers and the existing specs are
> all that's needed.
>
> JSON allows editors to edit and import strings directly, with no changes to
> the data.  The text you import is the text you save; everything (at least,
> all valid Unicode text) round-trips.  With RFC-822, you need to insert
> leading whitespace before continuation lines, so it has trouble maintaining
> this property.
>
> (Still, it's ugly that embedding a stylesheet would end up looking messy in
> a plain text editor.  You don't really want to flatten a whole stylesheet
> into one line.  We should be able to find an approach with none of the
> negatives: clean in plain text, round-trips all text, while remaining
> simple...)

Yeah, I think ultimately we do need a multi-line parsing capability.

It seems that there are work-arounds for multi-line JSON, see
http://stackoverflow.com/questions/2392766/multiline-strings-in-json
and
http://stackoverflow.com/questions/2033729/multi-line-elements-in-json .

However, I still don't like the extra quotes that are required around
everything. How about YAML associative arrays then? We could also not
accept the hierarchical bits. It has both a specification for newline
preserving and newline folding.

For example:

Kind:  captions
Initial-Timestamp: 1000
Style: |
 p {
   font-size: 100px;
 }
 .class {
   text-color: red;
 }

It also has some other niceties of how to specify binary data etc. See
http://en.wikipedia.org/wiki/YAML .

Regards,
Silvia.

Received on Friday, 20 April 2012 04:54:20 UTC