Re: XHTML character entity support

On Sun, Nov 1, 2009 at 3:15 PM, Maciej Stachowiak <mjs@apple.com> wrote:
>
> On Nov 1, 2009, at 6:13 AM, Shelley Powers wrote:
>
> This isn't a case of "breaking" the web: the specifications are clear
> in how named entities are handled. There are five predefined entities
> for XML, and several for HTML4 based on the HTML4 DTD. The addition of
> new named entities in XML is based on the use of DTDs, whether
> external or internal. There are 253 in total for XHTML based on DTDs,
> but only five of these are available to XML parsers that don't read
> external DTDs. XML Parsers do not have to read the external DTD.
>
> Clarity of the specifications doesn't mean you can do what they say without
> breaking the web. The specifications say it's your choice whether to support
> entities from the XHTML DTD or not, but in practice content relies on
> browsers doing so (in part because DTD-based validators said it was ok). So
> there's no real choice.

And I don't normally have a problem with this, as long as there's no
possibility with inconsistencies arising from this little browser
shortcut.

>
> If we change the document to allow additional named entities into
> XHTML5, existing XML parsers that read DTDs (validating parsers) will
> end up throwing errors when encountering an XHTML5 document that has
> anything other than the five predefined entities. They will have to be
> edited to "special case" XHTML5, just because XHTML5 is no longer well
> formed XML.
>
> The above wouldn't apply to documents with no doctype declaration, only ones
> with an XHTML 1.0 DTD. I believe I explained this in another message.
> (However, use of undeclared entities does not make an XML document fail to
> be well-formed).
>



> There was never an *issue of consistency before, because even though
> the browsers are not validating parsers, the doctypes they hard coded
> do have support for named entities, and therefore they are 'emulating'
> validating parsers. There is no inconsistent result between the true
> validating parser, and the faux validating parser (at least in this
> context).
>
> [...]
>
> But there is no DTD for HTML5[1]. Not even the XHTML version. Either
> we'll have inconsistent results (and errors) if people use named
> entities, or every validating XML parser and parser library in the
> world that potentially will need to parse  XHTML5 will need to be
> modified to adapt to the W3C's implementing a policy to deliberately
> create malformed XML.
>
> This makes me think you have a different understanding of the request than I
> do. Here is the rule I think should be specified:
> * Rule A: "XML documents that start with the XHTML 1.0 doctype or XHTML 1.1
> doctype should always be parsed with the XHTML 1.x set of entities by an
> HTML5 UA, even if it is not otherwise a validating XML processor."

Disagree. It may be browser practice, but XHTML isn't specific just to
browsers. It may be practice, and non-harmful (though as we can,
inconsistently applied), but I don't think that's a reason to validate
browser company behavior.

> You seem to be arguing against a rule like this:
> * Rule B: "XML documents that have no doctype declaration should always be
> parsed with the XHTML 1.0 set of entities by an HTML5 UA, even though they
> are not declared anywhere."

> I don't believe anyone is arguing in favor of Rule B (though I could be
> wrong). Do you have a problem with Rule A?

See above. Again, I don't have a problem with what browser companies
do, as long as the results are consistent with validating XML parsers.
But I don't think there's any reason to codify shortcuts in the HTML5
spec.

As it is, I'm not sure if the issue related to the original request
was specifically about named entities in HTML5, or the fact that
browser companies are inconsistent -- they allow named entities in the
document, but not in the innerHTML page fragments.

Again, though, I don't think this requires codifying in HTML5. Perhaps
bugs need to be filed with browser companies that provide named
entities based on the XHTML 1.0 doctypes, but don't provide the named
entities in innerHTML.

Alexey, can you provide more details about exactly what you want when
you file the bug? Or in an email to this thread?

> Regards,
> Maciej
>

Shelley

Received on Sunday, 1 November 2009 21:43:36 UTC