RE: Using Entity References in XSL Templates from Arjun Ray on 2000-01-24 (www-html@w3.org from January 2000)

From: Arjun Ray <aray@q2.net>
Date: Mon, 24 Jan 2000 06:28:31 -0500 (EST)
To: www-html@w3.org
cc: xsl-list@mulberrytech.com
Message-ID: <Pine.LNX.4.10.10001240606120.13213-100000@mail.q2.net>

On Fri, 21 Jan 2000, Mike Brown wrote:

> HTML 4 uses numeric entities ...

Please note that _character reference_ and _entity reference_ are distinct
categories.  In particular, the former is not an "entity" at all. 

> ... to refer exclusively to code positions in the document's character
> set, while named entities refer to character positions in either
> ISO-8859-1 or UCS, depending on which entity you're referring to.

Not quite.  In the entity declarations, the entities are defined in terms
of character references.  That's OK, because the connection between
ISO-8859-1/UCS and the document character set is determined by the SGML
declaration. 

> In HTML, &nbsp; is always ISO-8859-1 character number 160, i.e. a
> non-breaking space ... but &#160; is simply character number 160 in the
> character set of the document encoding, 

Oops.  Not at all.  Absolutely not.  The encoding has absolutely nothing,
repeat **NOTHING** to do with this.  The I18n spec is required reading:

  http://www.ietf.org/rfc/rfc2070.txt

(It's the job of the "entity manager" to transcode from the encoding to
the document character set.)

These references may be helpful also:

  http://www.mulberrytech.com/papers/docchar.htm
  http://www.hut.fi/u/jkorpela/chars.html
  http://candl.let.ruu.nl/Archive/cts/html/scharacterset.htm

Arjun

Received on Monday, 24 January 2000 06:20:52 UTC