John Boyer wrote: > On the other hand, the XML rule for element 'content' refers > to 'CharData', which only forbids the use of less-than (<) > and ampersand (&) in character content. Not quite. Production [14] reads: [14] CharData ::= [^<&]* - ([^<&]* ']]>' [^<&]*) The notation for productions is defined in section 6 (http://www.w3.org/TR/REC-xml#sec-notation), where we have in particular: [^abc], [^#xN#xN#xN] matches any Char with a value not among the characters given. where "Char" is a link to production [2] Char. This is from the second edition. The first edition said "matches any character with a value not among the characters given.", where "character" was a link to the definition of character just above production [2]: "A character is an atomic unit of text as specified by ISO/IEC 10646 [ISO/IEC 10646]. Legal characters are tab, carriage return, line feed, and the legal graphic characters of Unicode and ISO/IEC 10646." So the controls (except tab, carriage return, line feed) were already excluded. The change was first effected by E93 to the first edition (http://www.w3.org/XML/xml-19980210-errata#E93) published in July 2000. > The canonicalization > rule for text node processing was based on the CharData rule, > so it is possible to get a correct c14n program to write data > that Xerces cannot read and that is possibly not well-formed XML. Not by the above. > If there is an erratum that rewrites rule 14 in a manner > similar to the rule above (such that CharData reflects the > restrictions of Char), then the C14N Recommendation will need > an erratum to the processing model for text nodes. I don't think [14] needs an erratum, but C14N probably so. Regards, -- François YergeauReceived on Thursday, 20 February 2003 11:03:18 GMT
This archive was generated by hypermail 2.2.0 + w3c-0.29 : Thursday, 13 January 2005 12:10:16 GMT