Request for clarification from mfinney@lynchburg.net on 2000-02-10 (xml-editor@w3.org from January to March 2000)

From: <mfinney@lynchburg.net>
Date: 10 Feb 2000 20:34:37 GMT
To: xml-editor@w3.org
Message-Id: <200002101541768.SM00166@mfinney>

I would like to request a clarification on the grammar for CDATA secion. 
Production [20] states...

   [20]  CData ::=  (Char* - (Char* ']]>' Char*))

which would imply that characters which do not meed the Char [2] production
are not allowed in CDATA sections.  Further, section 2.11 states...

   To simplify the tasks of applications, wherever an external parsed
   entity or the literal entity value of an internal parsed entity contains
   either the literal two-character sequence "#xD#xA" or a standalone
   literal #xD, an XML processor must pass to the application the single
   character #xA. (This behavior can conveniently be produced by normalizing
   all line breaks to #xA on input, before parsing.) 

This implies that the new-line processing takes place in the CDATA sections
because of the last sentence.

However, Tim Bray states in his annotation to XML...

   When you look at CDATA, you might get the impression that you could
   maybe jam your binary data in a CDATA section. You'd be right, but
   you'd have to guarantee that it never included a byte sequence that
   looks like ]]>.

which would be incorrect if either only Char data is allowed or if new-line
processing does takes place as described.

Is it the intention that either restriction apply to CDATA sections?  Or any
other restriction that would prevent binary data (other than ]]>) from
being represented in CDATA sections?  Thank you.
Michael Lee Finney
   michael.finney@acm.org
   michael.finney@computer.org

Received on Thursday, 10 February 2000 15:34:37 UTC