Re: compatMode and the HTML parsing algorithm from Jonas Sicking on 2011-10-04 (www-dom@w3.org from October to December 2011)

From: Jonas Sicking <jonas@sicking.cc>
Date: Mon, 3 Oct 2011 17:14:27 -0700
To: David Flanagan <dflanagan@mozilla.com>
Cc: www-dom@w3.org
Message-ID: <CA+c2ei8khNs1BOgmjWR7bkU=AWSkg-FWuwUoKsnOTw6Xp0Ywaw@mail.gmail.com>

On Mon, Oct 3, 2011 at 5:04 PM, David Flanagan <dflanagan@mozilla.com> wrote:
> The HTML parsing algorithm has steps that require one to "set the Document
> to quirks-mode" or "set the Document to limited quirks-mode". The DOM
> doesn't define any API for doing that, but does define the
> Document.compatMode attribute which depends on those settings having been
> made.  As far as I can tell, this means that it is not possible to implement
> a conforming HTML parser unless you're also implementing the DOM itself.  I
> can't, for example, write an HTML parser in JavaScript that builds a tree
> using the native DOM provided by a browser, since I can't get the correct
> behavior for compatMode.
>
> Note that I cannot just expect document.implementation.createDocumentType()
> to do the right thing based on the doctype name, publicid and systemid.  The
> parsing algorithm also sometimes forces a document into quirks mode based on
> syntax errors in the <!DOCTYPE> tag, but these syntax errors aren't visible
> once the tree-building stage of the parsing algorithm begins.
>
> I propose, therefore, that DOM4 add a 4th argument to createDocumentType().
>  If true, then the document associated with that document type would be in
> quirks mode.  If false, or omitted, then the document is in no-quirks mode
> or limited-quirks mode.  Alternatively, and for more future flexibility,
> this 4th argument could be an optional string that becomes the value of
> compatMode.

I don't really like the idea of making it possible for pages to create
Documents which are more "quriky" modes than absolutely needed.

Is anyone actually planning on implementing a HTML parser where they
don't control the DOM? I know that in the Gecko HTML parser we use a
lot of internal functions in order to improve performance.

/ Jonas

Received on Tuesday, 4 October 2011 00:15:24 UTC