- From: Maciej Stachowiak <mjs@apple.com>
- Date: Sat, 31 Oct 2009 17:44:04 -0700
- To: Boris Zbarsky <bzbarsky@mit.edu>
- Cc: Alexey Proskuryakov <ap@webkit.org>, Shelley Powers <shelley.just@gmail.com>, HTML WG <public-html@w3.org>
On Oct 31, 2009, at 5:00 PM, Boris Zbarsky wrote: > On 10/31/09 5:32 PM, Alexey Proskuryakov wrote: >> WebKit does not use a validating parser, but it does support XHTML >> named >> entities. I'm not quite sure about Firefox. > > Likewise. Firefox loads http://hg.mozilla.org/mozilla-central/file/4597c9ddc1ff/content/xml/content/src/xhtml11.dtd > (which pretty much just defines the relevant named entities) when > it detects certain doctypes. See the table at http://hg.mozilla.org/mozilla-central/file/4597c9ddc1ff/parser/htmlparser/src/nsExpatDriver.cpp#l287 > > Firefox can also load external DTDs if they satisfy certain > constraints (e.g. being installed as part of the app itself). See http://hg.mozilla.org/mozilla-central/file/4597c9ddc1ff/parser/htmlparser/src/nsExpatDriver.cpp#l785 > > The DTD is only really used for ID attribute names and named > entities; no validation is performed. It would be good for some spec to define the Gecko/WebKit behavior. The crux of the issue is this. The XML spec allows XML processors to be one of the following: A) A validating processor (in which case they must read all external DTDs, process the declared entities, and expand the entities when appropriate, and which must also report violations of DTD constraints. B) A non-validating processor, which does not read external DTDs and does not provide any entities other than the ones predefined for XML, and any defined in the internal subset. Neither A nor B is practical for the Web. Running a full validating parser is too heavyweight, so A is not an option. But there's XHTML content out there that does use XHTML entities; failing to expand these entity references results in undesired behavior. So B is also not an option. In theory interoperable XML content should never use anything but the built-in XML entities, unless it can guarantee that it will only ever be processed with a validating parser. In practice, that's not what happens. XHTML content uses the XHTML entities. And browsers that don't handle it are perceived as broken. In practice, browsers do this compromise thing, where they recognize certain DTDs and define the relevant entities, but without validating. This is arguably against the spirit of the XML spec, but I think it is the practical choice. Regards, Maciej
Received on Sunday, 1 November 2009 00:44:39 UTC