Re: Splitting up the spec from Jonas Sicking on 2008-11-21 (public-html@w3.org from November 2008)

From: Jonas Sicking <jonas@sicking.cc>
Date: Fri, 21 Nov 2008 14:20:49 -0800
To: Jim Jewett <jimjjewett@gmail.com>
CC: HTML WG <public-html@w3.org>
Message-ID: <49273441.5080302@sicking.cc>
Jim Jewett wrote:
> Jonas Sicking wrote:
> 
> == Why splitting out error handling is a bad idea.
> 
>> First of all the reason that we are in this situation with HTML being a
>> total mess to parse is in large parts because the HTML4 spec left error
>> handling undefined. This resulted in different browsers doing different
> 
> I agree that a HTML consumer needs both parsing and error handing.
> 
> But a HTML producer needs neither.
> 
> Since browsers already implement dozens of specs, having them
> implement both "HTML semantics" and "HTML parsing/processing/error
> correction" isn't that much of an extra burden.
> 
> For a simple authoring tool, such as a report generator, or a
> converter from another format, there is great value in being able to
> say "I only care about the HTML semantics spec", and not having to
> worry about the corner cases of parsing.

There is always going to be things in any given specification that some 
people don't need. For example if I write a website exclusively for 
blind people I don't need <img>, <video>, or <input type=image>. Or if 
I'm writing something that doesn't use script I don't care about 
<script>, <canvas>, or <event-source>. Or if i'm writing poems I don't 
care about <table>, <tr>, <td>, or <th> since there is no need for 
tabular data. Or if I don't care about CSS I won't care about <style>, 
<div>, <span>, rel=stylesheet, or style="...".

Does this mean we should break out all those parts into separate specs too?

IMHO we should not split up specs based on what use cases there are, but 
rather what it makes sense from the point of view of designing the spec. 
Splitting apart things that are heavily intertwined makes creating the 
spec significantly harder, and also makes it harder for the people that 
do care about the full range of functionality.

Yes, browsers do implement multiple specs. And more often then not that 
fact leads to that there are critical parts that are undefined in how 
these specs interact. This has lead to the same problems as the 
undefined error handling in HTML4, every browser did what they thought 
was a good idea (often driven by release schedules, internal 
implementation details, and accidental decisions) and after a few years 
we have a hodge-podge of what each browser did.

I'm definitely not saying that we should collapse all of w3c into a 
single spec. We need to find the right level of separation. But 
separating things out into separate specs has a cost that is far from 
zero. Especially when those specs heavily interact.

There are lots of things we can do to ensure that authors have the 
information they need. First of all we can ensure that the authoring 
parts of the spec are easy to find. This is already done here:

http://www.whatwg.org/specs/web-apps/current-work/multipage/syntax.html

Second we can write non-normative documents that are easier to 
understand than a spec will ever be. There is precedence for this 
withing w3c where primers have been written as separate documents from 
the specification. I think in general very few people head for the spec 
when wanting to author against a spec, be that HTML, CSS, xsl:fo, or 
RDF. In general tutorials and implementation documentation is the first 
place most authors head. The spec is a last resort if those documents 
doesn't contain information on a detail.

Lastly, and I think most importantly, we can supply verification tools. 
As I've said before, i think the validator at http://validator.w3.org/ 
has done worlds more for authors than the HTML4 spec ever has.

>> Another reason splitting error handling from the 'language spec' is that
>> there are interdependencies. We've had to adjust aspects of the language
>> due to how current browsers do error handling. Otherwise we would end up
>> with a language which when sent to existing browsers would render
>> gibberish.
> 
> This does limit the choices available to the HTML semantics spec, but
> those limits apply regardless of whether the semantics section is
> split out.
> 
> At most, splitting would suggest an extra informative note on the
> order of "Yes, this seems sub-optimal, but there are legacy
> constraints."  (In the amalgamated spec, the note wouldn't be needed
> because the reader would see -- and perhaps get lost in -- the
> constraints directly.)

Unfortunately this is only true if the decision process remains exactly 
as if the spec was a single spec, rather than three separate. If not 
just the specs get separated, but also the process for designing them do 
then we end up with things "falling between the chairs" and interactions 
getting undefined.

And as soon as Mikes document came out people started pushing things in 
the hope that the process would now be different for this new document. 
I think this is inevitable if we split the specs up.

> [That said, I can't think of an example right now -- just how hairy
> were the adjustments?]

I haven't been very involved with the parsing parts of the spec, so I 
can only think of two examples. One was that we couldn't use <figure> 
and/or <legend> as we wanted because of how current browsers treat them. 
We also can't make <img> be non-empty and contain fallback contents due 
to how current browsers parse.

> == Why splitting out DOM is a bad idea.
> 
>> There are heavy interdependencies between the language and the scripting
>> model.
> 
> Not really.  There are heavy interdependencies between a few specific
> elements, and the scripting model.  I agree that most applications
> using those elements will need to be aware of the processing part of
> the spec -- but many pages just won't use those elements.
> 
>> For example <video> would not have made sense to add if the
>> scripting model hadn't been taken into account. We would have simply
>> said that <object> could have been used.
> 
> But as long as it exists, it is still useful without scripting.
> 
> I agree that <video> will probably have long sections in the
> processing spec as well.  And <canvas> probably won't even be used by
> people or tools that skip the processing spec.

My point is that the markup language and the scripting API needs to be 
designed together. Splitting the two up into separate specs makes that 
harder and more unlikely to happen correctly.

> But <h1> and <div> won't need the processing spec.

But <div> is pretty much only useful if the authoring tool uses CSS. 
Does that mean we should move that into a separate spec too?

/ Jonas
Received on Friday, 21 November 2008 22:22:54 UTC