W3C home > Mailing lists > Public > public-texttracks@w3.org > February 2012

Re: meta-data in the VTT file header, a strawman proposal

From: Ralph Giles <giles@mozilla.com>
Date: Fri, 24 Feb 2012 19:36:15 -0800
Message-ID: <4F48572F.7040203@mozilla.com>
To: David Singer <singer@apple.com>
CC: public-texttracks@w3.org
Hash: SHA1

On Fri 24 Feb 2012 03:44:10 PM PST, David Singer wrote:

> This is a strawman proposal for the VTT header data (aka
> meta-data).

Thanks for making a concrete proposal. Comments inline below.

> The 'meta-data' consists of a series of keyword-value pairs,
> separated by an equals ('=') sign, with white space around the
> equals sign ignored.

As I mentioned on the bug, I think using a colon (':') as the
key-value separator is easier to read in such a text-oriented format.
I am inspired by RFC 822 and HTTP message headers, which unlike HTML
attributes have an established multi-line extension.

> 1) plain keywords, consisting of the upper and lower-case 'ASCII'
> letters, and numerals, only;

I think we should match the attribute character set. Webvtt documents
themselves are utf-8, so I don't see the value of this restriction.

I would however recommend that any keywords this group defines respect
this restriction, or something like it.

> Case is not significant in keywords.

We can likewise borrow the 'ASCII case-insensitive' matching used by
HTML attributes to extend this to utf-8 keywords. That is, ASCII
characters are case insensitive, but other unicode characters are
compared exactly as codepoints.

> 2) qualified keywords, that contain one or more dash ('-')
> characters

I don't think vendor prefixes have been a success in css. I'd rather
see us agree on things quickly and/or reuse the _attr or x-vendor-attr
extentions from the the HTML spec.

> [multiline quotes with [[..]] and '\' to escape empty lines] Note
> that it is an error to encounter a blank or white-space only line,
> while accumulating lines before the closing ']]' line; such lines
> are required to have been escaped.

Do you propose that the whole file be rejected in that case? A parser
ignoring metadata is just looking for a repeated eol sequence, so
whitespace-only lines aren't really a problem. If there's an unescaped
newline, the parser will try to interpret the remaining metadata as a
cue, which will fail unless it happens to contain a timestamp.

Message-Header style gives us another way to escape continued lines,
which is to indent them with whitespace. This isn't any harder
programmatically, but is easier to read. (You did it anyway in your
example!) Might be more confusing to debug blank line escapes though.

> Examples:
> kind=captions examplecompany-test = for steve 
> initialTStimestamp=162642774 
> stylesheetURL=http://www.example.com/vtt-plain.css stylesheet=
> [[ p { font-size: 100px; } \ p::first-line { background:
> url(http://www.w3.org/StyleSheets/TR/logo-REC) no-repeat; 
> font-size: 10px; span { border-left: solid 1em black; } } ]] 
> srclang=en-US label=Zeroes for King!

My suggestion would look like:

Kind: captions
X-examplecompany-test: for steve
Timestamp-offset: 162642774
StylesheetURL: http://www.example.com/vtt-plain.css
 p { font-size: 100px; }

 p::first-line {
   background: url(http://www.w3.org/StyleSheets/TR/logo-REC) no-repeat;
   font-size: 10px;
   span { border-left: solid 1em black; }
srclang: en_US
Label: 𝟎s for 王!

There is one argument for using '=' instead of ':' as the separator,
which is that namespaced xhtml attributes contain colons, but cannot
contain and equals sign. If that's a concern (a la dc:creator) we can
make the separator ':' plus whitespace.

We could avoid the stylesheetURL keyword by using "style: @import('url')".

Version: GnuPG v1.4.12 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

Received on Saturday, 25 February 2012 03:36:44 UTC

This archive was generated by hypermail 2.3.1 : Thursday, 8 May 2014 13:18:50 UTC