- From: Mike Brown <mbrown@corp.webb.net>
- Date: Fri, 21 Jan 2000 13:45:09 -0500 (EST)
- To: "'xsl-list@mulberrytech.com'" <xsl-list@mulberrytech.com>
- Cc: "'www-html@w3.org'" <www-html@w3.org>
On the XSL list, Michael Kay wrote: > I don't see the value of outputting rather than   If I am reading the specs right, and I'm not sure that I am, HTML 4 uses numeric entities to refer exclusively to code positions in the document's character set, while named entities refer to character positions in either ISO-8859-1 or UCS, depending on which entity you're referring to. The common ones refer to ISO-8859-1 characters, while the Greek and Math ones, for example, refer to UCS characters. In HTML, is always ISO-8859-1 character number 160, i.e. a non-breaking space ... but   is simply character number 160 in the character set of the document encoding, and thus may not refer to a non-breaking space. XML, in contrast, consistently uses numeric entities to refer to UCS code positions, independent of the document encoding. It strikes me as being a little weird that the XHTML 1.0 PR doesn't address this disparity. I would assume that XHTML is XML and thus numeric character references must refer to UCS code positions. Thus in order to transform an HTML document into XHTML, one must determine the document encoding and use that as a basis for the possible transformation of numeric entity references to their XML equivalents. Right? Wrong? References: http://www.w3.org/TR/1999/PR-html40-19990824/charset.html http://www.w3.org/TR/1999/PR-html40-19990824/sgml/entities.html http://www.w3.org/TR/1998/REC-xml-19980210.html#sec-references (followups to www-html@w3.org, please) - Mike ___________________________________________________________ Mike J. Brown, software engineer, Webb Interactive Services XML/XSL stuff: http://www.skew.org/ http://www.webb.net/
Received on Monday, 24 January 2000 05:03:57 UTC