Re: Metadata in the VTT file header (bug 15851), use cases (and a need to close this)

On Wed, Aug 29, 2012 at 7:46 PM, Ian Hickson <ian@hixie.ch> wrote:

> kind="" and label="" are needed in the container format, since without
> them it's not clear how you would even know what to do with _any_ text
> track; they're not WebVTT-specific in the least. That leaves default="",
> which is not really necessary, but could trivially be supported in any
> container format if truly necessary just by prefixing the WebVTT payload
> with "DEFAULT" or some such.


(I'm really surprised that you're recommending so many proprietary
extensions.  Do you really want WebVTT files floating around with "DEFAULT"
stuck in some arbitrary place?  At least if someone wants to use an
"X-Default" header--not that I'm proposing this as a use case--it's far
less likely to cause parser compatibility headaches later than someone
making up his own extension that "seems to work" to them at the time.)

So? He's wrong to do so. :-)
>
> Sure, you can turn everything into a name-value syntax if you push hard
> enough. It doesn't mean that that's the right solution for every problem.
>

It's the right solution for the *aggregate* of these use cases.  If there
was only one piece of data that we wanted to store, it probably wouldn't
be; but we don't have just one piece of data.

WebVTT already has a way to extend it to support new data blocks like
> style, as is also discussed in that bug. I don't see why we'd want to use
> a complicated name-value pair syntax for embedding CSS.
>

Coming up with a new syntax and updating parsers for every new piece of
data is complex.

First, there have really not been any compelling use cases. All the use
> cases presented are either better handled in other ways in WebVTT (e.g.
> how to embed styles, offsets),


I disagree that special casing every new piece of data (inline styles, URLs
to external stylesheets, language tags, "kind" tags) is less complex than
defining the format once so parsers don't have to keep changing.  It's a
mess to write a "STYLE" header parser, and then a "Language" parser, and
then an "External-Stylesheet" parser, and then, and then ...  Putting these
features in the parser is the wrong layer of abstraction; they belong on
top of the parser, not within it.

The inverse is also true: editors needing to write code to output "STYLE",
then code to output "Language", and so on.  A clean Python API for
manipulating WebVTT files would look like:

>>> vtt = webvtt.open('file.vtt')
>>> vtt.headers.get("Language")
en
>>> vtt.headers['Language'] = 'fr'
>>> vtt.write('file.vtt')

without the parser needing to know anything at all about "Language", so
when "External-Stylesheet" or "Style" are added later and I want to support
it in my editor (or more simply, to write a script that reads and modifies
a file, as above), nothing changes in the module.  With your piece-by-piece
solution, this is impossible.  At best, it'd have to expose "unknown header
chunks" and make me parse it out by hand, which would be a terrible API.

(I suspect this is the main point of disagreement.)

or are already handled sufficiently by
> WebVTT now or WebVTT with other additions like the block comment syntax
> (e.g. anything involving proprietary workflow additions only needed during
> production).
>

I'm pretty sure the use case presented was the *standard* parts of the
workflow (eg. the language and kind fields, which are later consumed by the
WebM muxing tool or an HTML generator outputting <track> fields), not
proprietary workflow.

-- 
Glenn Maynard

Received on Thursday, 30 August 2012 01:50:07 UTC