RE: Using Entity References in XSL Templates

On Fri, 21 Jan 2000, Mike Brown wrote:

> HTML 4 uses numeric entities ...

Please note that _character reference_ and _entity reference_ are distinct
categories.  In particular, the former is not an "entity" at all. 

> ... to refer exclusively to code positions in the document's character
> set, while named entities refer to character positions in either
> ISO-8859-1 or UCS, depending on which entity you're referring to.

Not quite.  In the entity declarations, the entities are defined in terms
of character references.  That's OK, because the connection between
ISO-8859-1/UCS and the document character set is determined by the SGML
declaration. 

> In HTML,   is always ISO-8859-1 character number 160, i.e. a
> non-breaking space ... but   is simply character number 160 in the
> character set of the document encoding, 

Oops.  Not at all.  Absolutely not.  The encoding has absolutely nothing,
repeat **NOTHING** to do with this.  The I18n spec is required reading:

  http://www.ietf.org/rfc/rfc2070.txt

(It's the job of the "entity manager" to transcode from the encoding to
the document character set.)

These references may be helpful also:

  http://www.mulberrytech.com/papers/docchar.htm
  http://www.hut.fi/u/jkorpela/chars.html
  http://candl.let.ruu.nl/Archive/cts/html/scharacterset.htm


Arjun

Received on Monday, 24 January 2000 06:20:52 UTC