Re: An HTML language specification from Henri Sivonen on 2008-11-21 (public-html@w3.org from November 2008)

From: Henri Sivonen <hsivonen@iki.fi>
Date: Fri, 21 Nov 2008 12:08:35 +0200
To: Rob Sayre <rsayre@mozilla.com>
Cc: public-html@w3.org
Message-Id: <76FAB176-5A5B-4239-BB01-F0BDB4EA8E2D@iki.fi>
On Nov 21, 2008, at 11:16, Rob Sayre wrote:

> Henri Sivonen wrote:
>> How would Mozilla work have benefited from the parsing algorithm  
>> being in a different document?
>
> I was thinking of the frequent request for DOMParser to handle text/ 
> html. For this use, you probably don't want scripts executing, but  
> you probably don't want <noscript> parsing either. This desired tree  
> output is similar to server side uses I've observed.

Yeah, it would be useful to specify how the scripting state works with  
DOMParser.

Currently, the XHR2 spec says: "If final MIME type is text/html let  
document be an object implementing the Document interface that  
represents the response entity body parsed following the rules set  
forth in the HTML specification for an HTML parser with scripting  
disabled and then terminate this algorithm. [HTML5]"
http://dev.w3.org/2006/webapi/XMLHttpRequest-2/

I would guess that if that behavior would be wrong for the use case of  
DOMParser, it would be wrong for the use cases of XHR, too. I think  
the XHR2 spec or a spec for DOMParser could say "let document be an  
object implementing the Document interface that represents the  
response entity body parsed following the rules set forth in the HTML  
specification for an HTML parser with scripting enabled but without  
executing scripts and then terminate this algorithm" without any HTML  
5 spec refactoring.

> The Mozilla work would have benefited from a clear, complete, and  
> finished document on HTML parsing and tokenization. I don't see why  
> this document needs to be tied to a SQL API.
>
> Really, it's the publication schedules and revisions that are  
> interesting, not the division of the document. I think this is  
> obvious though, and I find all of the word games about "separate  
> documents" to be quite counterproductive.

The WHATWG copy of the spec already has low-bureaucracy maturity  
indicators for sections. Managing different maturity levels of  
different parts of what is now a monolithic spec in the W3C/IETF way  
adds bureaucracy and causes artificial problems when seeking to do  
honest normative cross-referencing.

I think that it's quite possible that the way the W3C and IETF manage  
spec maturity levels is less practical for speccing the interoperable  
browser platform than the way many countries manage their legal code  
(constantly patching a bit book with insanely complex cross-references  
including mutual referencing between various titles/acts).

The main problems with the W3C/IETF model are:
  1) In order for a more mature spec to reference a mature section  
inside an otherwise less mature spec, the latter needs to be split  
causing more bureaucracy.
  2) Circular references lock specs into advancing together in  
maturity levels.

I'm not suggesting that all the specs that cover pieces of the  
interoperable browser platform should be folded into one huge spec,  
but I am inclined to think that the bureaucracy flowing from the  
maturity rules related to normative cross references isn't productive  
and doesn't necessarily serve its purpose. (If SVG references CSS2  
instead of CSS2.1, who benefits from readers having to have the tacit  
knowledge that everyone is supposed to go read CSS 2.1 instead of CSS2?)

As for the maturity of the parsing section specifically, I think it  
cannot mature more from where it is now before we get browser builds  
with an implementation of the current draft to testers.

>>>> ... It's not horribly intertwined but there are some  
>>>> dependencies ...
>>> I agree. That's why I don't think splitting parsing *and*  
>>> vocabulary into a separate document is unreasonable on its face.
>>
>> I don't find it unreasonable on its face. (For MathML and SVG  
>> elements, text/html parsing and the vocabulary are already in  
>> separate documents.) However, I think here we should allow the  
>> person who does the work use the spec organization that suits his  
>> work pattern, because having the parsing and vocabulary in the same  
>> document isn't unreasonable on its face, either.
>
> Isn't this whole thread an uproar about someone else doing some work?

The "language spec" uproar is partly about splitting away something  
that *is* strongly connected with the parts that it left out.

> What if someone proposed taking some of these not horribly  
> interwined sections and putting them in a separate document (and  
> doing the work)? Is that heretical?


No, it's not:
http://lists.w3.org/Archives/Public/public-html/2008Oct/0127.html

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/
Received on Friday, 21 November 2008 10:09:17 UTC