- From: Glenn Maynard <glenn@zewt.org>
- Date: Thu, 30 Aug 2012 18:37:59 -0500
- To: Simon Pieters <simonp@opera.com>
- Cc: Ian Hickson <ian@hixie.ch>, David Singer <singer@apple.com>, public-texttracks <public-texttracks@w3.org>
- Message-ID: <CABirCh9hNomGjk9hHkkH-+PnUshZ9C6bjJuAzOu7Vib-J6LWMA@mail.gmail.com>
On Thu, Aug 30, 2012 at 12:22 PM, Ian Hickson <ian@hixie.ch> wrote: > What difference does it make if we provide a syntax or not? The data is > still proprietary, the value is still in a proprietary syntax. > I gave several reasons why it's very useful to have a single syntax (not modifying parsing libraries, not modifying formatters). It would be unfortunate if html5lib had to be modified every time a new attribute or element was added to HTML. This is the fallacy that led to XML. People thought that providing a > standard syntax would mean everything would interoperate and we could dump > HTML overboard. It doesn't work that way. What matters isn't the syntax, > syntax is easy. What matters is the semantics. > "This is the fallacy that led to JSON." XML was a failure, but not because it tried to provide a standard syntax. Providing "frameworks for future data if needed" is an anti-goal and a > language design anti-pattern. > No, modifying parsers and APIs for every new piece of data is an "anti-pattern". I'm not recommending any more than anyone else, I'm just saying we > shouldn't try to second-guess their needs and offer a specific place for > them other than just comments. > This does nothing more than store strings; it makes no attempt at arbitrary data types, high-level structure or "second-guessing". > It's the right solution for the *aggregate* of these use cases. > > That's not how language design works. You don't pick an arbitrary set of > use cases and design a solution that fits all of them poorly. I'm glad that's not what we did, then. We saw that a set of use cases all have the same structure--an identifier and a string--and designed a clean solution that fits them all well. The complex part of adding CSS to WebVTT implementations is not the syntax > for adding a new block to the WebVTT parser. That part is trivial. > It doesn't matter if it's "trivial", when the work that needs to be done is in a system library. This is the same kind of reasoning that leads to things like XML. It does > not lead to simple languages. > "This is the same kind of reasoning that leads to things like JSON. It does not lead to simple languages." You're using XML as an argument that generic file formats are always bad. XML was bad due to massive overdesign--I had a phonebook-sized book on XML at one point back in the day--and because it structurally didn't map to real-world data structures. It wasn't bad because it's impossible to make clean, general-purpose file formats; JSON proved that. > I'm pretty sure the use case presented was the *standard* parts of the > > workflow (eg. the language and kind fields, which are later consumed by > > the WebM muxing tool or an HTML generator outputting <track> fields), > > not proprietary workflow. > > It's hard for me to know since there haven't been any concrete examples so > far. > Of course there have. "Language" and "kind". On Thu, Aug 30, 2012 at 1:14 PM, David Singer <singer@apple.com> wrote: > >> Also, &escapes; need to be allowed, so that "-->" can be escaped to > >> "-->". This is needed for single-line headers, too. > > Why? We're in a block of text whose only parsing rule is that it > terminates in a blank line. What am I missing? > See step 14 of the parser. The header loop ends when it sees "-->". (I wouldn't object to that being changed; IMO it's pretty ugly. I think it's this ugly because it assumes no knowledge of what headers look like--which is something this proposal can fix. If the header format is defined before we're stuck with that part of the parser, then the parser can do something more sensible.) On Thu, Aug 30, 2012 at 5:06 AM, Simon Pieters <simonp@opera.com> wrote: > WEBVTT > language: fr > kind: subtitles > > STYLE > #foo { color:green } > i { font-family:serif } > How does this represent blank lines? Editing software should allow people to paste in CSS, and when they come back to it later, show the original text they entered in its original format, without blank lines stripped. foo > 00:00:00.000 --> 00:00:05.000 > testing <i>testing</i> > > > This is backwards compatible! No escapes needed! It doesn't support > embedding "-->", but I'd rather change the parser to only be aggressive > about "-->" after the first real cue than to support &escapes; in metadata > values. > I believe the point of the "-->" special case (parser step 14) is to tolerate when people omit the post-header blank line, eg. WEBVTT 00.000 --> 05.000 ... Without it, this cue would be thrown away by the header loop. (It's syntactically incorrect, but it's going to be a very common error, so the parser tries to deal with it anyway. Otherwise, the failure mode is having the first cue silently discarded, which is subtle and would probably lead to a lot of captions in the wild missing their first caption. You still lose the cue identifier, but that's much less common.) I do think this error handling could be done better, since it should be possible to distinguish between a cue timing line and a header line, and if this isn't put off too long this may be doable without compatibility problems. For example, instead of treating any line containing "-->" as breaking out of the header loop, only do so if it's not a header line or within a multiline header block. That means: Language: en 00.000 --> 05.000 ... would see the "-->" and break out, but Style: | .stuff::after { content: "john went --> home"; } . wouldn't. This would still handle this common authoring error in most cases, but without needing > escaping in headers. (To guarantee that header starting lines are never ambiguous with cue timings, require that headers begin with a letter, so they can never look like a WebVTT timestamp.) -- Glenn Maynard
Received on Thursday, 30 August 2012 23:38:28 UTC