- From: Arjun Ray <aray@q2.net>
- Date: Mon, 24 Jan 2000 06:28:31 -0500 (EST)
- To: www-html@w3.org
- cc: xsl-list@mulberrytech.com
On Fri, 21 Jan 2000, Mike Brown wrote: > HTML 4 uses numeric entities ... Please note that _character reference_ and _entity reference_ are distinct categories. In particular, the former is not an "entity" at all. > ... to refer exclusively to code positions in the document's character > set, while named entities refer to character positions in either > ISO-8859-1 or UCS, depending on which entity you're referring to. Not quite. In the entity declarations, the entities are defined in terms of character references. That's OK, because the connection between ISO-8859-1/UCS and the document character set is determined by the SGML declaration. > In HTML, is always ISO-8859-1 character number 160, i.e. a > non-breaking space ... but   is simply character number 160 in the > character set of the document encoding, Oops. Not at all. Absolutely not. The encoding has absolutely nothing, repeat **NOTHING** to do with this. The I18n spec is required reading: http://www.ietf.org/rfc/rfc2070.txt (It's the job of the "entity manager" to transcode from the encoding to the document character set.) These references may be helpful also: http://www.mulberrytech.com/papers/docchar.htm http://www.hut.fi/u/jkorpela/chars.html http://candl.let.ruu.nl/Archive/cts/html/scharacterset.htm Arjun
Received on Monday, 24 January 2000 06:20:52 UTC