- From: <keshlam@us.ibm.com>
- Date: Tue, 8 Aug 2000 08:41:27 -0400
- To: www-dom@w3.org
(Lauren: FAQ?) >I'm German, so I use a lot of special characters like ä in my >pages. Are all these characters single nodes in the DOM? This depends on whether you've asked your parser to retain Entity Reference nodes or not. These names are defined by your DTD rather than by XML, so they _are_ entities. If you've asked for Entity Reference nodes to be created in the DOM, that means that each of these takes two nodes --- one for the Entity Reference, and one for the Text node containing the character. If you've told the DOM generator (parser or other code) to "expand" the entity references, then the Text node would appear... and then would be merged with adjacent Text nodes when a normalize() operation is done. Note that the DOM requires that XML processors -- parsers -- normalize documents before they are presented to the user; other applications may not do so. If you want to avoid the inefficiency of using two nodes for a single character, there are ways to do so. One is to use a numeric character reference (such as ") to encode the character, rather than a declared entity; these are always converted directly into characters. An easier solution would be to use an encoding that directly supports the language that you are writing in, so you can simply type the intended characters and have the XML parser read them correctly; check whether your XML parser supports the encoding used for German. > And when I create a TextNode, can I use those entities there? All DOM API calls operate in terms of UTF-16 character strings. These can directly represent most characters in most languages, which would allow you to say document.createTextNode("mäh") When you write the document out as XML syntax, your serializer is responsible for checking the output encoding and deciding whether it can write the character directly or must use a numeric character reference. Do _not_ write document.createTextNode("mäh") ... that puts the ampersand into the document's contents, and is equivalent to writing m&38;auml;h in your XML file. An Entity node declares an entity; an Entity Reference node is a reference to that entity. If you really want to create an entity reference (ä) in your document and have it written out that way, you must create an Entity Reference node document.createEntityReference("auml") and insert that. You don't have to create the children of the EntityReference; they are copied automatically from the Entity previously stored in the DocumentType. Note that in DOM Level 2 there is still no portable way to manually declare new Entities; we expect to address that in DOM Level 3's "Content Model" chapter. Hope that helps... ______________________________________ Joe Kesselman / IBM Research
Received on Tuesday, 8 August 2000 08:41:46 UTC