Re: Metadata in the VTT file header (bug 15851), use cases (and a need to close this) from Silvia Pfeiffer on 2012-08-31 (public-texttracks@w3.org from August 2012)

From: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
Date: Fri, 31 Aug 2012 11:44:39 +1000
To: Ian Hickson <ian@hixie.ch>
Cc: public-texttracks <public-texttracks@w3.org>
Message-ID: <CAHp8n2=LuxuDv-0xuaS4D9RPiZypPaojOZmBX65qQsYpAjQWqQ@mail.gmail.com>
On Fri, Aug 31, 2012 at 3:22 AM, Ian Hickson <ian@hixie.ch> wrote:
>
> That's not how language design works. You don't pick an arbitrary set of
> use cases and design a solution that fits all of them poorly. You pick a
> single use case, design a good solution for it, and move on to other use
> cases, and if many of them have very similar solutions then you consider
> how they might be made to work more uniformly.

We've done exactly that. We have Language, Stylesheet and Kind as
name-value pairs that need to be provided as header-style data.

>> When the WebVTT file is authored, there is no <track> attribute to get
>> the information about @kind or @language out of, or to associate that
>> information with. The WebVTT file stands alone all by itself.
>
> Right, just like how a CSS file doesn't say if it's an alternative style
> sheet or what it's title is.

CSS does not make sense as stand-alone content. WebVTT does. You
cannot compare CSS and WebVTT, but you can compare HTML and WebVTT.
We've introduced many features to provide semantics and
self-inspection in HTML. WebVTT needs that too.

>> Asking for that information out of band (i.e. outside the WebVTT file)
>> is an utter pain and prone to error when we already have a text file
>> that has space to carry this information.
>
> Having that information in-band means you have to read the file to know
> what to do with it. That doesn't make sense. This kind of information
> belongs in the place that embeds or links to the file, not in the file.

Not at all. When VLC reads the video and the webvtt file, there is no
other file "embedding the webvtt file". WebVTT is a piece of content
that needs to be able to stand on its own and needs to provide
information about itself to caption infrastructure.

>> That is because the browsers generally don't make use of the name-value
>> pairs and Web pages are written basically for browsers, not for anything
>> else.
>>
>> This is not the case here.
>
> It's exactly like HTML. There's a small number of things (language,
> formatting defaults, style sheets) that make sense for the user agent to
> consume, just like in HTML, where <meta> has a small number of values that
> make sense for user agents.

Agreed. We have identified those for WebVTT, too.

> And then there's a zillion other fields that
> authors will put in if we let them, that will waste their time, etc, as
> described above. Things like copyright notices, intended audiences, etc.

Let other standard bodies that need this information define it if you
like. But let's please not have everyone define their own means of
parsing it.

The only thing that XML has actually been successful at is providing a
means to markup stuff such that a common parser can parse them without
knowing what's in them. JSON does the same but simpler. I call that a
success not a failure.


>> Here we deal with an industry that is using caption and other text track
>> files to display in different players, many of which are not Web
>> browsers. Files are being embedded into video files and extracted again,
>> all without a Web browser. All the information that we need has to be
>> self-contained - we cannot rely on a Web page providing additional
>> information.
>
> If there is _specific information_ that is needed for _specific use
> cases_, then please file bugs for those.

I have.

>> On the contrary: it is a huge waste of time to have to write a different
>> name-value-pair parser for every WebVTT provider.
>
> The data I'm talking about _isn't consumed_. So there's no parser to
> write.

The data I am talking about probably isn't consumed by browsers, but
it certainly is consumed by other technologies in the caption
lifecycle.

>> Agreed, in particular for HTML which is already massive.
>
> WebVTT is IMHO part of HTML. (Or rather, both are part of the Web
> Platform, which is already massive.)

This is where all the problems originate. Only one role of WebVTT is
to provide captions to HTML. It is also used in many other instances
that no caption file format can escape, such as for TVs, desktop
players etc. If you are saying that you're not interested to satisfy
these use cases, then let's stop this discussion and do the
specification of header-style metadata not in WebVTT.

Regards,
Silvia.
Received on Friday, 31 August 2012 01:45:27 UTC