Re: Numeric character references from Andrew Clover on 2004-06-05 (www-dom@w3.org from April to June 2004)

From: Andrew Clover <and-w3@doxdesk.com>
Date: Sat, 05 Jun 2004 22:35:57 +0200
To: www-dom@w3.org
Message-ID: <40C22EAD.9020603@doxdesk.com>

Roopa Trivedi <rotrived@cisco.com> wrote:

> I want to understand how the various DOM parsers are expected to behave when
> the XML document contains numeric character references.

They are converted to plain characters and included in Text nodes as if 
the characters were included directly. (The in-built entity references 
such as &amp; do the same.)

> Should the DOM parsers convert the numeric character references for "ph" to
> something?

Yes, the Unicode character for that phoneme. By spec, DOM 
implementations must use a character set that supports Unicode.

> If yes, then if I want to conver it back to XML, how should these be
> converted back to the numeric values?

They shouldn't, normally, for UTF-8 and other encodings capable of 
reproducing the entire Unicode repertoire. Numerical character 
references should come out when you try to serialise your document 
containing an IPA character to an encoding that doesn't support IPA 
characters, for example ISO-8859-1 or US-ASCII.

-- 
Andrew Clover
mailto:and@doxdesk.com
http://www.doxdesk.com/

Received on Saturday, 5 June 2004 16:34:22 UTC