- From: Andrew Clover <and-w3@doxdesk.com>
- Date: Sat, 05 Jun 2004 22:35:57 +0200
- To: www-dom@w3.org
Roopa Trivedi <rotrived@cisco.com> wrote: > I want to understand how the various DOM parsers are expected to behave when > the XML document contains numeric character references. They are converted to plain characters and included in Text nodes as if the characters were included directly. (The in-built entity references such as & do the same.) > Should the DOM parsers convert the numeric character references for "ph" to > something? Yes, the Unicode character for that phoneme. By spec, DOM implementations must use a character set that supports Unicode. > If yes, then if I want to conver it back to XML, how should these be > converted back to the numeric values? They shouldn't, normally, for UTF-8 and other encodings capable of reproducing the entire Unicode repertoire. Numerical character references should come out when you try to serialise your document containing an IPA character to an encoding that doesn't support IPA characters, for example ISO-8859-1 or US-ASCII. -- Andrew Clover mailto:and@doxdesk.com http://www.doxdesk.com/
Received on Saturday, 5 June 2004 16:34:22 UTC