- From: Richard Tobin <richard@cogsci.ed.ac.uk>
- Date: Wed, 31 Jan 2001 17:08:51 GMT
- To: www-xml-infoset-comments@w3.org
As requested by the XML Core WG, I am recording some arguments in favour of retaining CDATA section boundaries in the Infoset. An application that receives its input from an XML parser will see no difference between text escaped with character references and text escaped with CDATA sections. However, this is not always the situation, and it may be desirable to preserve CDATA sections in output. For example: - Text escaped with CDATA sections may be more readable by humans. This is especially true for "quoted XML". - It may also improve interoperability with non-XML tools. For example, it is entirely reasonable to run "grep" on an XML file, or on a directory containing both XML and non-XML files, and a search for "AT&T" will not match AT&T. - There may even be applications that extract text (such as scripts) from XML documents without parsing, on the assumption that the relevant text is contained in a CDATA section. The presence of CDATA section boundaries in the Infoset will encourage this preservation (though it is not of course required for it). The argument that it may be impossible to output the text in a CDATA section in some encodings may be irrelevant to the users in question, since their editors and other tools may well only work with a particular encoding anyway. Certainly many users would find it unacceptable if running XInclude or XSLT on their documents changed the encoding. -- Richard
Received on Wednesday, 31 January 2001 12:08:53 UTC