RE: Possible XML and C14N errata

John Boyer wrote:
> On the other hand, the XML rule for element 'content' refers 
> to 'CharData', which only forbids the use of less-than (<) 
> and ampersand (&) in character content.

Not quite.  Production [14] reads:

[14]    CharData    ::=    [^<&]* - ([^<&]* ']]>' [^<&]*) 

The notation for productions is defined in section 6
(http://www.w3.org/TR/REC-xml#sec-notation), where we have in particular:

[^abc], [^#xN#xN#xN]
         matches any Char with a value not among the characters given.

where "Char" is a link to production [2] Char.

This is from the second edition.  The first edition said "matches any
character with a value not among the characters given.", where "character"
was a link to the definition of character just above production [2]: "A
character is an atomic unit of text as specified by ISO/IEC 10646 [ISO/IEC
10646]. Legal characters are tab, carriage return, line feed, and the legal
graphic characters of Unicode and ISO/IEC 10646."  So the controls (except
tab, carriage return, line feed) were already excluded.

The change was first effected by E93 to the first edition
(http://www.w3.org/XML/xml-19980210-errata#E93) published in July 2000.


> The canonicalization 
> rule for text node processing was based on the CharData rule, 
> so it is possible to get a correct c14n program to write data 
> that Xerces cannot read and that is possibly not well-formed XML.  

Not by the above.


> If there is an erratum that rewrites rule 14 in a manner 
> similar to the rule above (such that CharData reflects the 
> restrictions of Char), then the C14N Recommendation will need 
> an erratum to the processing model for text nodes.

I don't think [14] needs an erratum, but C14N probably so.

Regards,

-- 
François Yergeau

Received on Thursday, 20 February 2003 11:03:18 UTC