W3C home > Mailing lists > Public > public-html@w3.org > November 2009

Re: XHTML character entity support

From: Shelley Powers <shelley.just@gmail.com>
Date: Tue, 3 Nov 2009 13:39:18 -0600
Message-ID: <643cc0270911031139q2f7e0362mba774217daa6a450@mail.gmail.com>
To: Boris Zbarsky <bzbarsky@mit.edu>
Cc: Henri Sivonen <hsivonen@iki.fi>, Simon Pieters <simonp@opera.com>, Geoffrey Sneddon <gsneddon@opera.com>, John Cowan <cowan@ccil.org>, "public-xml-core-wg@w3.org" <public-xml-core-wg@w3.org>, "public-html@w3.org" <public-html@w3.org>
On Tue, Nov 3, 2009 at 12:56 PM, Boris Zbarsky <bzbarsky@mit.edu> wrote:
> On 11/3/09 1:08 PM, Shelley Powers wrote:
>> Second, what you're discussing seems to be something that's general to
>> all XML parsers, not just browsers. As such, it would be better
>> defined as part of XML core, rather than an end case like HTML5. Or is
>> there something unique that the browsers do that falls outside of
>> normal XML parsing?
> As I understand it, XML core allows several different parser behaviors here
> (ranging from "report a well-formedness error" to "load the DTD and expand
> entities using it" to "use a local catalog" to "just shows the unexpanded
> entities as text").  I could be wrong in this understanding, of course;
> please correct me if I am.
> If I understand correctly, browsers have by and large chosen a particular
> behavior: using a local catalog for particular DTDs.  The suggestion is to
> define that those particular DTDs should use a specific local catalog and
> what that local catalog is.
> Since the DTDs involved are the various XHTML DTDs, it seems that this group
> might be the one tasked with such definition.

You have a valid point, as does Henri, and all the way back to Alexey.

But, there is a sentence in the existing HTML5 specification that reads:

According to the XML specification, XML processors are not guaranteed
to process the external DTD subset referenced in the DOCTYPE. This
means, for example, that using entity references for characters in
XHTML documents is unsafe if they are defined in an external file
(except for &lt;, &gt;, &amp;, &quot; and &apos;)

That is all a web author needs to know. It is clean. It is simple. It
is extremely uncomplicated. It does not require the addition of reams
of text. The only thing I would change is to partner this sentence
with a recommendation to use numeric character references. And even
that's redundant.

>> Besides, the point is moot: XHTML5 does not have a DTD, only the five
>> predefined works with the XHTML.
> Agreed; this discussion isn't about XHTML5 per se.
>> The job of this working group is not to normalize all of the browser
>> quirks and differences. Don't you agree?
> In the area of "html" (whatever that might mean), this does in fact seem
> like the job of this working group, fundamentally.

No, the charter for this group is to provide a DOM, an evolution of
HTML4, an XML serialization, some APIs, and some wizzy gee wiz
graphical "stuff".

I think it's a mistake to include the areas already included in the
document whose sole purpose seems to be to normalize browser behavior.
Not anything specific to HTML, as markup. Not even anything specific
to the Document Object Model: browser specific stuff.

I don't think we need to compound this with yet another section,
specifically for browsers.

We support an XML serialization, we follow the rules for XML as
specified in the XML 1.0 specification, and among these is the caveat
that when it comes to entities such as &copy; or &pound; there is no
guarantee that these externally defined entities will be available.

For certain doctypes, the browsers support the entities via catalog.
This is consistent with validating parsers.

Where there is inconsistency is when there is an unknown doctype.
Opera doesn't generate an error, but doesn't expand the entity.
Firefox, Safari, and Chrome given an error. The worst, inconsistent
caes with the "unknown" doctype is with XHTML+RDFa, which does define
the external entities, but which is not in the browsers list of
doctypes. But the behavior is still consistent, because there is that
rule: there is no guarantee that the parser will read the external

For XHTML5, which has no DTD, the behavior is consistent: only the
five predefined entities are available, anything else is an error. And
from I can see, the behavior with this is consistent.


> -Boris
Received on Tuesday, 3 November 2009 19:39:52 UTC

This archive was generated by hypermail 2.4.0 : Saturday, 9 October 2021 18:45:02 UTC