Re: Metadata in the VTT file header (bug 15851), use cases (and a need to close this) from Ian Hickson on 2012-08-31 (public-texttracks@w3.org from August 2012)

From: Ian Hickson <ian@hixie.ch>
Date: Fri, 31 Aug 2012 00:56:00 +0000 (UTC)
To: public-texttracks <public-texttracks@w3.org>
Message-ID: <Pine.LNX.4.64.1208310032550.614@ps20323.dreamhostps.com>
On Thu, 30 Aug 2012, Glenn Maynard wrote:
> 
> > This is the fallacy that led to XML.
> 
> "This is the fallacy that led to JSON."

Yes, JSON is similar in this respect. You'll notice only one format in 
HTML uses JSON, despite HTML having around a dozen custom formats. This is 
because JSON is not a magical syntax that solves every problem. (And in 
the case of the JSON microdata syntax, it's not the primary format, and 
it's actually a lossy representation of the data in question.)


> XML was a failure, but not because it tried to provide a standard 
> syntax.

XML wasn't a failure, by a long shot. Nor is JSON. They both have 
situations in which they are useful.

However, where they have both failed (and where XML was intended to 
succeed -- JSON wasn't, so it's normal that it "failed" here) is in being 
a generic format intended for generic consumers. (JSON is positioned as a 
data-interchange format _between specific consumers of specific 
vocabularies_. XML was positioned as being useful with generic consumers.)


> > Providing "frameworks for future data if needed" is an anti-goal and a 
> > language design anti-pattern.
> 
> No, modifying parsers and APIs for every new piece of data is an 
> "anti-pattern".

For every piece of data, sure. Nobody is suggesting that, as far as I'm 
aware.


> > I'm not recommending any more than anyone else, I'm just saying we 
> > shouldn't try to second-guess their needs and offer a specific place 
> > for them other than just comments.
> 
> This does nothing more than store strings; it makes no attempt at 
> arbitrary data types, high-level structure or "second-guessing".

Guessing that you only need to store strings is still a guess. Currently 
there's only one piece of information proposed for WebVTT that makes sense 
to have in VTT that fits in the form of a simple one-line string, the 
language (and it's really a complicated structured data format itself, not 
really freeform string). There have been a number of other proposals for 
things that might go here, and while I disagree that they make sense in 
VTT, even amongst those not everything is a string -- e.g. "default" is 
really a boolean, not a string, and so we'd have to add conventions on top 
of the format beyond the string value to determine what it means (since 
presumably what matters there is presence/absence, not the value, and the 
empty string would be positive, not negative), if we even tried to use a 
string form to store it, which IMHO is a bad idea (as I think HTML 
attributes have shown for the many boolean attributes there).


> I'm glad that's not what we did, then.  We saw that a set of use cases 
> all have the same structure--an identifier and a string--and designed a 
> clean solution that fits them all well.

What use case has identifiers??? I'm not aware of any! There are some that 
have strings (language and styling comes to mind), but the use cases don't 
have identifiers, you'd have to add one to make it make sense to use in 
name-value pairs.


> > > I'm pretty sure the use case presented was the *standard* parts of 
> > > the workflow (eg. the language and kind fields, which are later 
> > > consumed by the WebM muxing tool or an HTML generator outputting 
> > > <track> fields), not proprietary workflow.
> >
> > It's hard for me to know since there haven't been any concrete 
> > examples so far.
> 
> Of course there have.  "Language" and "kind".

It's not at all clear to me that that information should be inline. That 
will just lead to inconsistent data, as we've seen with e.g. Content-Type 
headers and character encoding labels. As I've said before, for the case 
of "muxing", i.e. where there's data in the file for the purpose of the 
editing workflow, it's not clear that you'd ever want that data to survive 
outside the editing workflow, and so I don't see why we need to define 
anything here. In particular, even if we do define something, I don't see 
why anyone would care -- they'd want to include whatever information they 
need for their workflow, e.g. the filename of the source video, the 
percentage done, the name of the QA person, the memory state of the 
editor, etc, and thus would end up deing their own stuff anyway.

If the data is proprietary, the syntax doesn't need to be standard.


> On Thu, Aug 30, 2012 at 5:06 AM, Simon Pieters <simonp@opera.com> wrote:
> >
> > STYLE
> > #foo { color:green }
> > i { font-family:serif }
> 
> How does this represent blank lines?  Editing software should allow 
> people to paste in CSS, and when they come back to it later, show the 
> original text they entered in its original format, without blank lines 
> stripped.

Blank lines are not meaningful in CSS, so you can just strip them.

Within the editing workflow, before publication, editors can use whatever 
format they want. WebVTT does not pretend to be an editing workflow 
format, not would it be anything close to a good choice for such a format.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
Received on Friday, 31 August 2012 00:56:23 UTC