- From: John Cowan <cowan@locke.ccil.org>
- Date: Fri, 16 Apr 1999 13:00:05 -0400 (EDT)
- To: LWatanab@JetForm.com (Larry Watanabe)
- Cc: www-dom@w3.org
Larry Watanabe scripsit: > These characters can be encoded as "<" and ">", which also requires that > "&" be encoded as "&". However, this seems like a) an ad hoc solution, > and b) something which has probably already been solved. That *is* the solution. You cannot just blindly write out a Text node; you must check for & and < and ]]> and make the correct substitutions, just as you must watch for characters unrepresentable in the output charset and write character references (unless you are writing UTF-8 or UTF-16). > Q: Does anyone know of a general encoding routine for encoding the text > within a Text node that > > a) preserves information; the same text read in by a SAX parser will > be converted to the correct characters without the use of a special decoding > routine? > b) handles all other cases besides "<" and ">" if there are any? No, but it's easy to concoct one along the lines I mention above. The hardest part is probably finding out what the current character repertoire (= set of representable characters) for the output is. -- John Cowan cowan@ccil.org e'osai ko sarji la lojban.
Received on Friday, 16 April 1999 12:57:28 UTC