Re: XHTML character entity support from Maciej Stachowiak on 2009-11-01 (public-html@w3.org from November 2009)

From: Maciej Stachowiak <mjs@apple.com>
Date: Sun, 01 Nov 2009 13:15:15 -0800
To: Shelley Powers <shelley.just@gmail.com>
Cc: Boris Zbarsky <bzbarsky@mit.edu>, Alexey Proskuryakov <ap@webkit.org>, HTML WG <public-html@w3.org>
Message-id: <BE70AEF8-AAF3-4D3C-A066-B1BB9D8E6EC0@apple.com>

On Nov 1, 2009, at 6:13 AM, Shelley Powers wrote:

>
> This isn't a case of "breaking" the web: the specifications are clear
> in how named entities are handled. There are five predefined entities
> for XML, and several for HTML4 based on the HTML4 DTD. The addition of
> new named entities in XML is based on the use of DTDs, whether
> external or internal. There are 253 in total for XHTML based on DTDs,
> but only five of these are available to XML parsers that don't read
> external DTDs. XML Parsers do not have to read the external DTD.

Clarity of the specifications doesn't mean you can do what they say  
without breaking the web. The specifications say it's your choice  
whether to support entities from the XHTML DTD or not, but in practice  
content relies on browsers doing so (in part because DTD-based  
validators said it was ok). So there's no real choice.

> If we change the document to allow additional named entities into
> XHTML5, existing XML parsers that read DTDs (validating parsers) will
> end up throwing errors when encountering an XHTML5 document that has
> anything other than the five predefined entities. They will have to be
> edited to "special case" XHTML5, just because XHTML5 is no longer well
> formed XML.

The above wouldn't apply to documents with no doctype declaration,  
only ones with an XHTML 1.0 DTD. I believe I explained this in another  
message. (However, use of undeclared entities does not make an XML  
document fail to be well-formed).

> There was never an *issue of consistency before, because even though
> the browsers are not validating parsers, the doctypes they hard coded
> do have support for named entities, and therefore they are 'emulating'
> validating parsers. There is no inconsistent result between the true
> validating parser, and the faux validating parser (at least in this
> context).

[...]

>
> But there is no DTD for HTML5[1]. Not even the XHTML version. Either
> we'll have inconsistent results (and errors) if people use named
> entities, or every validating XML parser and parser library in the
> world that potentially will need to parse  XHTML5 will need to be
> modified to adapt to the W3C's implementing a policy to deliberately
> create malformed XML.

This makes me think you have a different understanding of the request  
than I do. Here is the rule I think should be specified:

* Rule A: "XML documents that start with the XHTML 1.0 doctype or  
XHTML 1.1 doctype should always be parsed with the XHTML 1.x set of  
entities by an HTML5 UA, even if it is not otherwise a validating XML  
processor."

You seem to be arguing against a rule like this:

* Rule B: "XML documents that have no doctype declaration should  
always be parsed with the XHTML 1.0 set of entities by an HTML5 UA,  
even though they are not declared anywhere."

I don't believe anyone is arguing in favor of Rule B (though I could  
be wrong). Do you have a problem with Rule A?

Regards,
Maciej

Received on Sunday, 1 November 2009 21:15:50 UTC