W3C home > Mailing lists > Public > www-html@w3.org > May 2003

Re: Named character entities

From: William F Hammond <hammond@csc.albany.edu>
Date: 19 May 2003 12:03:40 -0400
To: W3C HTML Specification Discussion <www-html@w3.org>
Message-ID: <i7llx2nctf.fsf@hilbert.math.albany.edu>

Ian Hickson <ian@hixie.ch> writes:

> More to the point, XHTML can't make restrictions on XML parsers beyond
> those of XML. This has to be the case so that arbitrary XML parsers can be
> re-used in XHTML environments, otherwise XHTML processors must have
> specialised XML parsers.

It may be useful for interoperability to require that XHTML be
parsable by arbitrary XML parsers, but it is not inherently impossible
to bring up XHTML as a markup language that, rather than _being_ an
XML application, has a canonically associated XML application.  That
way additional requirements can be imposed.

> In the case of entities, XML says that non-validating parsers need not
> recognise anything outside of the five pre-defined entities. Thus, an
> arbitary non-validating XML parser will probably not recognise the XHTML
> entities. By the time the XHTML-specific part of the UA gets involved, it
> is likely that the entities are long lost.

The 5 entities are "amp", "lt", "gt", "quot", and "apos".  The list
does NOT include "copy".

"&copy;" is an interesting case because it is neither used in markup
nor (AFAIK) used natively in association with the language of a
non-ascii locale.  It is an exception to the idea that CDATA encoding,
e.g., the processing route from an author's keystroke to UTF-8, should
be locale special.

Named character entities are importantly useful when one needs to
refer to characters outside of those provided natively in one's
locale.  This occurs also in the case of MathML.  The W3C math group
has learned to live without named character entities, though not
easily.  Discussion about it still persists at www-math@w3.org.

I think it desirable for XHTML user agents to provide some form of
imitation of Mozilla's handling of the 253 character entites defined
in XHTML 1.0.

One way that would make sense for XHTML documents is to require that
an external (presumably "file") entity defined in the internal
declaration subset *must* be parsed when its name begins with one of a
list of specified strings such as "xhtml-".  Mozilla's example would
suggest that user agents would always have these external entities
available on the local platform.  Additional XML namespaces would
reasonably extend the list of specified initial segments of required
external entity names.

                                    -- Bill
Received on Monday, 19 May 2003 12:03:43 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 15:06:03 UTC