Re: HTML5: clean and non-clean from Henri Sivonen on 2008-10-10 (www-tag@w3.org from October 2008)

From: Henri Sivonen <hsivonen@iki.fi>
Date: Fri, 10 Oct 2008 17:02:52 +0300
To: Julian Reschke <julian.reschke@gmx.de>
Cc: www-tag@w3.org
Message-Id: <9FA55223-429D-4F41-8EBC-AED8783699D3@iki.fi>
On Oct 10, 2008, at 16:03, Julian Reschke wrote:

> Henri Sivonen wrote:
>> In reference to:
>> http://www.w3.org/2001/tag/2008/09/23-minutes
>> The minuted discussion mentions "clean" and "not-clean" several  
>> times and separating them.
>> What kind the of separation was meant in the discussion?
>> HTML5 already says what is conforming (i.e. "clean"?). In some  
>> cases in the minutes, it *seems* that the discussion was about  
>> splitting the processing model instead of splitting out document  
>> conformance definition. If this was indeed what was discussed, what  
>> kind of implementor would benefit from reading only the "clean"  
>> part of the processing model?
>
> Probably none.
>
> However there are far more authors/content producers than  
> implementors, and those would benefit a lot.

What kind of document would they benefit from?

 From a standalone conformance definition without a processing model?  
 From a "clean" but implementation-unmatching processing model  
document? From something else?

> As Noah pointed out:
[...]
> "Actually, where I'm scribed as saying "separate permissive behavior  
> from clean behavior" isn't quite the nuance I had in mind. I think a  
> language specification indicates which documents are legal, and what  
> they mean.

Maintaining such a document as a separate file would probably be  
burdensome at this point. There has been talk about annotating the  
spec in such a way that style rules could hide parts that authors  
presumably don't need to know.

Also note that there is already a section called "Writing HTML  
documents" that declaratively states what the parsing algorithm parses  
without hitting parse errors. This section, being precise, isn't  
really suitable for casual authors. It's suitable for spec lawyers who  
don't wish to step through the parsing algorithm to see if a given  
string is free of parse errors. Casual authors need separate tutorials  
or O'Reilly books anyway, since a spec must cover all cases but  
pedagogic material needs to cover a subset of conforming cases that is  
broad enough to get stuff done.

> That's one spec. I think HTML 5 as drafted also includes a  
> specification for pieces of code we might call browsers, which by  
> the way attempt to provide useful output for content that would not  
> be "legal" in the language spec, e.g. improperly nested elements. I  
> think having both specifications is very important, but I would  
> prefer that the browser specification, including fixup of bad  
> content, was separate from the specification of the clean language  
> and its correct interpretation. The former spec. would be for  
> authors and for those who might in future be able to deploy less  
> permissive UAs;

What processing rules in HTML5 are so burdensome that one should  
expect an UA implementor to opt to make his/her product less  
compatible in order to simplify code? Note that finding out what the  
consequences of deviating from the spec are can be more costly than  
just going ahead and implementing what the spec says. After all, the  
point of having a processing model spec is that implementors don't  
need to try out what works.

That is, why bother with doing spec work that *might* in the *future*  
enable the deployment of less permissive UAs? Is the foreseeable  
benefit greater than the trouble of getting there? (What's the  
expected benefit?)

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/
Received on Friday, 10 October 2008 14:13:00 UTC