W3C home > Mailing lists > Public > www-dom@w3.org > April to June 1999

"<" and ">" within text nodes

From: Larry Watanabe <LWatanab@JetForm.com>
Date: Fri, 16 Apr 1999 09:23:16 -0400
Message-ID: <111CF63B7D2ED211830000805F65A2FFAD874A@OTTMAIL2>
To: www-dom@w3.org

Text nodes cannot contain  arbitrary text; in particular "<" and ">" will
cause SAX parse errors when the node is read back in. It is possible to
enclose this text withn a CDATA spection, but then there is the equivalent
problem with the CDATA terminator. In addition, CDATA may be undesirable for
other reasons (e.g. external requirements).

These characters can be encoded as "&lt" and "&gt", which also requires that
"&" be encoded as "&amp". However, this seems like a) an ad hoc solution,
and b) something which has probably already been solved. 

Q: Does anyone know of a general encoding routine for encoding the text
within a Text node that 

	a) preserves information; the same text read in by a SAX parser will
be converted to the correct characters without the use of a special decoding
routine?
	b) handles all other cases besides "<" and ">" if there are any?

Thank you.

-Larry Watanabe  lwatanab@jetform.com
Received on Friday, 16 April 1999 09:31:16 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Friday, 22 June 2012 06:13:46 GMT