Re: XHTML character entity support from Shelley Powers on 2009-11-01 (public-html@w3.org from November 2009)

From: Shelley Powers <shelley.just@gmail.com>
Date: Sun, 1 Nov 2009 08:13:20 -0600
To: Maciej Stachowiak <mjs@apple.com>
Cc: Boris Zbarsky <bzbarsky@mit.edu>, Alexey Proskuryakov <ap@webkit.org>, HTML WG <public-html@w3.org>
Message-ID: <643cc0270911010613i1782e192o6929f360385b18dc@mail.gmail.com>

On Sat, Oct 31, 2009 at 9:53 PM, Maciej Stachowiak <mjs@apple.com> wrote:
>
> On Oct 31, 2009, at 8:46 PM, Shelley Powers wrote:
>
> I've not seen good, technical reasons for this move. In this thread,
> I've read that browser companies have enabled named entity handing
> because of compatibility bugs, even though the bugs were, technically,
> invalid. I've read that since this is what has happened in the past,
> seemingly we'll have to support it in the future. And lastly, since
> some browsers have implemented this approach, HTML5 should make it all
> OK.
>
> I pretty much agree with your summary. Except that I think what you
> described *is* a good technical reason for making a change to the spec.
> Regards,
> Maciej
>

I would say it demonstrates, more, a lack of discipline on the part of
the browser implementers, in addition to less adherence to standards
than is touted in press releases.

This isn't a case of "breaking" the web: the specifications are clear
in how named entities are handled. There are five predefined entities
for XML, and several for HTML4 based on the HTML4 DTD. The addition of
new named entities in XML is based on the use of DTDs, whether
external or internal. There are 253 in total for XHTML based on DTDs,
but only five of these are available to XML parsers that don't read
external DTDs. XML Parsers do not have to read the external DTD.

I'm not quibbling on what happens with HTML: I mean, I lost interest
in it with the unquoted attribute values. It's already sloppy and
undisciplined.

But I'm not going to be willing to introduce the same level of
sloppiness into the XHTML serialization of HTML5.

If we change the document to allow additional named entities into
XHTML5, existing XML parsers that read DTDs (validating parsers) will
end up throwing errors when encountering an XHTML5 document that has
anything other than the five predefined entities. They will have to be
edited to "special case" XHTML5, just because XHTML5 is no longer well
formed XML.

There was never an *issue of consistency before, because even though
the browsers are not validating parsers, the doctypes they hard coded
do have support for named entities, and therefore they are 'emulating'
validating parsers. There is no inconsistent result between the true
validating parser, and the faux validating parser (at least in this
context).

But there is no DTD for HTML5[1]. Not even the XHTML version. Either
we'll have inconsistent results (and errors) if people use named
entities, or every validating XML parser and parser library in the
world that potentially will need to parse  XHTML5 will need to be
modified to adapt to the W3C's implementing a policy to deliberately
create malformed XML.

Shelley

[1] http://lists.whatwg.org/htdig.cgi/help-whatwg.org/2009-March/000192.html

*Well, no inconsistency until RDFa in XHTML, which kind of puts the
_need_ for additional pre-defined entities in doubt, since the browser
companies felt no urgency to make this change for this particular form
of XHTML.

Received on Sunday, 1 November 2009 14:13:54 UTC