Re: Metadata in the VTT file header (bug 15851), use cases (and a need to close this)

On Wed, 29 Aug 2012, Glenn Maynard wrote:
> 
> To give an example, when you mux audio, video and caption files into a 
> .WebM file, you currently need to supply metadata to the muxing 
> software, eg.:
> 
> mkvmerge -o example.mkv video.avi audio.ogg --language 2:ger german.vtt
> --language 3:eng english.vtt
> 
> By including this inline, muxers can find this out directly:
> 
> mkvmerge -o example.mkv video.avi audio.ogg *.vtt
> 
> This causes users (me) headaches, and can greatly simplify this very 
> common case.

Language information is needed for other reasons too, there's a bug about 
it.

However, needing language metadata is not an argument for arbitrary 
metadata. Other than possibly similar syntax, there's no relationship 
between a defined semantic feature of the language and a dumping ground 
for proprietary metadata, and, furthermore, we have learnt over the years 
that it is critical to keep the well-defined semantic data in the language 
cleanly separate from the author-driven random metadata, or they end up 
conflicting and generally being a huge mess. Just look at <meta> in HTML. 
It's the reason we gave up trying to give meaning to class="". It's the 
reason we designed microdata to have very clear separation between defined 
types and vocabularies and arbitrary types with no vocabularies. It's the 
reason CSS has clear syntax for things like @charset that is not generic 
and leaves proprietary metadata to comments.


> Muxing software shouldn't be storing data in proprietary magic headers 
> and then parsing them out. That's nuts--it means the format *will* have 
> key/val headers, just in an ad hoc, nonstandardized, uninteroperable 
> way.

What difference does it make if we provide a syntax or not? The data is 
still proprietary, the value is still in a proprietary syntax.

This is the fallacy that led to XML. People thought that providing a 
standard syntax would mean everything would interoperate and we could dump 
HTML overboard. It doesn't work that way. What matters isn't the syntax, 
syntax is easy. What matters is the semantics.

If we're not adding semantics, we shouldn't add syntax. We can't know what 
the syntax will _need_ if we don't have the semantics.



> a framework for future data if needed

Providing "frameworks for future data if needed" is an anti-goal and a 
language design anti-pattern.


On Wed, 29 Aug 2012, David Singer wrote:
> On Aug 29, 2012, at 17:46 , Ian Hickson <ian@hixie.ch> wrote:
> > 
> > What data are we talking about here, anyway? <track> only has four 
> > relevant attributes as far as I can tell; srclang="" will be dealt 
> > with inline via this bug:
> > 
> >   https://www.w3.org/Bugs/Public/show_bug.cgi?id=15922
> 
> which has a key: value syntax:

I'm not arguing against key-value syntax, I'm arguing against arbitrary 
metadata. The syntax is an 


> > As in:
> > 
> >   WEBVTT
> > 
> >   00:11.000 --> 00:13.000
> >   <v Roger Bingham>We are in New York City
> > 
> >   OFFSET -01:00.000
> > 
> >   01:13.000 --> 01:16.000
> >   <v Roger Bingham>We're actually at the Lucern Hotel, just down the 
> > 
> > ...or some such.
> 
> I can't see how you could do that in a backwards-compatible fashion.  

I don't understand. Backwards-compatible with what? How is it less 
compatible with anything than having the offset elsewhere?


On Wed, 29 Aug 2012, Glenn Maynard wrote:
>
> I'm really surprised that you're recommending so many proprietary 
> extensions.

I'm not recommending any more than anyone else, I'm just saying we 
shouldn't try to second-guess their needs and offer a specific place for 
them other than just comments.


> Do you really want WebVTT files floating around with "DEFAULT" stuck in 
> some arbitrary place?

It wouldn't be WebVTT, any more than <track default> is WebVTT.


> At least if someone wants to use an "X-Default" header--not that I'm 
> proposing this as a use case--it's far less likely to cause parser 
> compatibility headaches later than someone making up his own extension 
> that "seems to work" to them at the time.)

I'm not suggesting putting "default" information in WebVTT. I'm suggesting 
putting it _before_ the WebVTT file in the media stream payload.


> It's the right solution for the *aggregate* of these use cases.

That's not how language design works. You don't pick an arbitrary set of 
use cases and design a solution that fits all of them poorly. You pick a 
single use case, design a good solution for it, and move on to other use 
cases, and if many of them have very similar solutions then you consider 
how they might be made to work more uniformly.

In the case here, I'm at a loss as to find _any_ use cases that have needs 
that are similar to each other.


> I disagree that special casing every new piece of data (inline styles, 
> URLs to external stylesheets, language tags, "kind" tags) is less 
> complex than defining the format once so parsers don't have to keep 
> changing.

The complex part of adding CSS to WebVTT implementations is not the syntax 
for adding a new block to the WebVTT parser. That part is trivial.


> The inverse is also true: editors needing to write code to output 
> "STYLE", then code to output "Language", and so on.  A clean Python API 
> for manipulating WebVTT files would look like:
> 
> >>> vtt = webvtt.open('file.vtt')
> >>> vtt.headers.get("Language")
> en
> >>> vtt.headers['Language'] = 'fr'
> >>> vtt.write('file.vtt')
> 
> without the parser needing to know anything at all about "Language", so 
> when "External-Stylesheet" or "Style" are added later and I want to 
> support it in my editor (or more simply, to write a script that reads 
> and modifies a file, as above), nothing changes in the module.  With 
> your piece-by-piece solution, this is impossible.  At best, it'd have to 
> expose "unknown header chunks" and make me parse it out by hand, which 
> would be a terrible API.

This is the same kind of reasoning that leads to things like XML. It does 
not lead to simple languages.

The solution is not to try to come up with an uber-meta-language that 
solves all possible future problems (except all the ones we didn't think 
of -- e.g. XML totally fails at non-tree data structures). The solution is 
to just not change the language very often.

 
> > or are already handled sufficiently by WebVTT now or WebVTT with other 
> > additions like the block comment syntax (e.g. anything involving 
> > proprietary workflow additions only needed during production).
> 
> I'm pretty sure the use case presented was the *standard* parts of the 
> workflow (eg. the language and kind fields, which are later consumed by 
> the WebM muxing tool or an HTML generator outputting <track> fields), 
> not proprietary workflow.

It's hard for me to know since there haven't been any concrete examples so 
far.

What I would like to see is concrete examples, filed as bugs.


On Thu, 30 Aug 2012, Silvia Pfeiffer wrote:
>
> When the WebVTT file is authored, there is no <track> attribute to get 
> the information about @kind or @language out of, or to associate that 
> information with. The WebVTT file stands alone all by itself.

Right, just like how a CSS file doesn't say if it's an alternative style 
sheet or what it's title is.


> It may continue directly to the Web from here - in which case the Web 
> page author has to ask this additional information from the WebVTT 
> author.
> 
> It may also continue directly into a container format from here - in 
> which case the container encoding has to ask the additional information 
> from the WebVTT author.
>
> Asking for that information out of band (i.e. outside the WebVTT file) 
> is an utter pain and prone to error when we already have a text file 
> that has space to carry this information.

Having that information in-band means you have to read the file to know 
what to do with it. That doesn't make sense. This kind of information 
belongs in the place that embeds or links to the file, not in the file.

Just like with CSS.


On Thu, 30 Aug 2012, Silvia Pfeiffer wrote:
> 
> Your suggestion will lead to every vendor re-inventing another
> name-value markup means at the start of WebVTT.
> I don't call that a standard.

It's not a standard. Doesn't have to be. It's vendor-proprietary data that 
differs from vendor to vendor.


> >> Can you explain why you want to resist what many of us see as a 
> >> natural direction to go?
> >
> > Two reasons.
> >
> > First, there have really not been any compelling use cases. All the 
> > use cases presented are either better handled in other ways in WebVTT 
> > (e.g. how to embed styles, offsets), or are already handled 
> > sufficiently by WebVTT now or WebVTT with other additions like the 
> > block comment syntax (e.g. anything involving proprietary workflow 
> > additions only needed during production). Adding a feature that 
> > doesn't have compelling use cases is a recipe for disaster
> >
> > Second, what we have seen with HTML is that providing arbitrary 
> > name-value pair syntax that anyone can plug into tends to lead authors 
> > down this massive rabbit hole of timewasting. People see name-value 
> > pair metadata syntax and they go crazy adding all kinds of metadata in 
> > random syntaxes to it, with no common vocabulary, no common processing 
> > model, and with absolutely no idea what is ever going to consume it. 
> > And then: nothing consumes it.
> 
> That is because the browsers generally don't make use of the name-value 
> pairs and Web pages are written basically for browsers, not for anything 
> else.
>
> This is not the case here.

It's exactly like HTML. There's a small number of things (language, 
formatting defaults, style sheets) that make sense for the user agent to 
consume, just like in HTML, where <meta> has a small number of values that 
make sense for user agents. And then there's a zillion other fields that 
authors will put in if we let them, that will waste their time, etc, as 
described above. Things like copyright notices, intended audiences, etc.


> Here we deal with an industry that is using caption and other text track 
> files to display in different players, many of which are not Web 
> browsers. Files are being embedded into video files and extracted again, 
> all without a Web browser. All the information that we need has to be 
> self-contained - we cannot rely on a Web page providing additional 
> information.

If there is _specific information_ that is needed for _specific use 
cases_, then please file bugs for those.

That has nothing to do with generic metadata infrastructure.


> It is indeed conceivable that many proprietary name-value tags will be 
> created in addition to the small set that we have suggested for kind, 
> language, label, in-band style sheet, and external style sheet. And 
> indeed the CEA608 document that I've proposed already shows some that 
> are typically used by caption providers.

This is the disaster that IMHO we must avert.


> On the contrary: it is a huge waste of time to have to write a different 
> name-value-pair parser for every WebVTT provider.

The data I'm talking about _isn't consumed_. So there's no parser to 
write.


> > Formats that have no general name-value pair syntax, e.g. CSS and 
> > JavaScript, have not suffered the _slightest_ for it.
> 
> But you're wrong: CSS IS a set of name-value pairs - that's the file 
> format.

It's not, but even the parts that appear to be name-value pairs aren't 
arbitrary name-value pairs.


> > People still put their proprietary data in those formats (e.g. 
> > "javadoc"-style documentation in JavaScript), but they do so _when 
> > they need it_, with testing, with consumers. They include their 
> > copyrights in comments, and are none the worse for it. You don't get 
> > week-long threads on forums of people asking what syntax their 
> > copyright metadata in CSS should be, because the answer is trivial: 
> > put it in a comment.
> 
> Oh, but even there you will find that there are tools that will process 
> your comments to mean something and if you don't follow their approach, 
> you don't get the advantage of getting your Copyright recognized by 
> those tools.

There's no benefit to that. Certainly none that anyone has described as a 
valid use case for WebVTT.


> > But having said that: we _should_ _always_ be looking for reasons 
> > _not_ to do something: every time we add a feature to the Web 
> > platform, it has massive long-term costs. We should be hugely 
> > reluctant to do so. It is our responsibility as language designers to 
> > keep everything out of our languages unless the cost is justified by 
> > the massive gains. The default answer to every proposal should be "no" 
> > followed only then by "why?". If we can't find a _strong_ 
> > justification, we should not include it.
> 
> Agreed, in particular for HTML which is already massive.

WebVTT is IMHO part of HTML. (Or rather, both are part of the Web 
Platform, which is already massive.)


> Saying "no" has to stop, however, where us doing nothing will simply 
> lead to us becoming irrelevant, because the world needs that particular 
> feature. It does in this case - hopefully my arguments above have been 
> able to show this.

Not in the slightest, IMHO.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Received on Thursday, 30 August 2012 17:23:22 UTC