W3C home > Mailing lists > Public > www-dom@w3.org > July to September 1998

Re: Document cleanup suggestions

From: John Cowan <cowan@locke.ccil.org>
Date: Mon, 17 Aug 1998 15:10:48 -0400
Message-ID: <35D88037.1F58DD80@locke.ccil.org>
To: DOM List <www-dom@w3.org>
Some time ago, keshlam@us.ibm.com wrote:

> CDATA: this interface is-a Text. That means it implements the split and
> join operations, and can be passed to them... which means a CDATA could be
> passed to Text.joinText, or vice versa. The spec doesn't mention this, and
> probably should point out that the result of such a combination, if it
> works at all, would probably have to be a CDATA due to the risk of
> containing characters that normal Text can't.

Actually, Text (and for that matter CData) can contain any characters
whatsoever.  It just happens that when you *write out* a Text or CData
object you need to canonicalize it, and in different ways.  This is not
something that the current DOM has specific interfaces for.

This is a consequence of the fact that Text objects contain character
data, not markup.

> There is, of course, a hazard involved in CDATA: since it can legitimately
> contain both > and ] characters, it's possible that during editing of the
> DOM we might manage to introduce ]]> into its text contents. This would
> yield bad XML when written out since that's the CDATA terminating sequence.

Only if your writeOut method is unacceptably naive.  The writeOut method for
Text-not-CData objects should emit "<" as "&lt;" and "&" as "&amp;", and the
writeOut method for CData should emit "]]>" as "]]&gt;".

> But short of testing the characters on either side of the editing point
> every time we modify its contents, I don't see how we can prevent this. Do
> we want to commit to doing that check -- and if so, what do we do when we
> find that the user has asked us to break the document? (The hazard exists
> both for inserts and for deletion, of course; CDATA just isn't all that
> robust.)

The only thing that can "break the document" is an attempt to insert
characters that aren't Chars by the XML rules; i.e. non-Unicode values.

-- 
John Cowan	http://www.ccil.org/~cowan		cowan@ccil.org
	You tollerday donsk?  N.  You tolkatiff scowegian?  Nn.
	You spigotty anglease?  Nnn.  You phonio saxo?  Nnnn.
		Clear all so!  'Tis a Jute.... (Finnegans Wake 16.5)
Received on Monday, 17 August 1998 15:11:06 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Friday, 22 June 2012 06:13:45 GMT