RE: Using Entity References in XSL Templates

On the XSL list, Michael Kay wrote:
> I don't see the value of outputting   rather than  

If I am reading the specs right, and I'm not sure that I am, HTML 4 uses
numeric entities to refer exclusively to code positions in the document's
character set, while named entities refer to character positions in either
ISO-8859-1 or UCS, depending on which entity you're referring to. The common
ones refer to ISO-8859-1 characters, while the Greek and Math ones, for
example, refer to UCS characters.

In HTML,   is always ISO-8859-1 character number 160, i.e. a
non-breaking space ... but   is simply character number 160 in the
character set of the document encoding, and thus may not refer to a
non-breaking space. XML, in contrast, consistently uses numeric entities to
refer to UCS code positions, independent of the document encoding.

It strikes me as being a little weird that the XHTML 1.0 PR doesn't address
this disparity. I would assume that XHTML is XML and thus numeric character
references must refer to UCS code positions. Thus in order to transform an
HTML document into XHTML, one must determine the document encoding and use
that as a basis for the possible transformation of numeric entity references
to their XML equivalents. Right? Wrong?

References:
http://www.w3.org/TR/1999/PR-html40-19990824/charset.html
http://www.w3.org/TR/1999/PR-html40-19990824/sgml/entities.html
http://www.w3.org/TR/1998/REC-xml-19980210.html#sec-references

(followups to www-html@w3.org, please)

   - Mike
___________________________________________________________
Mike J. Brown, software engineer, Webb Interactive Services
XML/XSL stuff: http://www.skew.org/    http://www.webb.net/

Received on Monday, 24 January 2000 05:03:57 UTC