"<" and ">" within text nodes from Larry Watanabe on 1999-04-16 (www-dom@w3.org from April to June 1999)

From: Larry Watanabe <LWatanab@JetForm.com>
Date: Fri, 16 Apr 1999 09:23:16 -0400
To: www-dom@w3.org
Message-ID: <111CF63B7D2ED211830000805F65A2FFAD874A@OTTMAIL2>

Text nodes cannot contain  arbitrary text; in particular "<" and ">" will
cause SAX parse errors when the node is read back in. It is possible to
enclose this text withn a CDATA spection, but then there is the equivalent
problem with the CDATA terminator. In addition, CDATA may be undesirable for
other reasons (e.g. external requirements).

These characters can be encoded as "&lt" and "&gt", which also requires that
"&" be encoded as "&amp". However, this seems like a) an ad hoc solution,
and b) something which has probably already been solved. 

Q: Does anyone know of a general encoding routine for encoding the text
within a Text node that 

	a) preserves information; the same text read in by a SAX parser will
be converted to the correct characters without the use of a special decoding
routine?
	b) handles all other cases besides "<" and ">" if there are any?

Thank you.

-Larry Watanabe  lwatanab@jetform.com

Received on Friday, 16 April 1999 09:31:16 UTC