- From: Paul Duffin <pduffin@volantis.com>
- Date: Fri, 12 Jul 2002 09:21:28 -0400 (EDT)
- To: xml-editor@w3.org
This question relates to the 2nd Edition of the XML Version 1.0 specification. http://www.w3.org/TR/REC-xml I have looked in the archives and errata but cannot find any answer to my question although there are some unanswered questions which touch on this. The second example seems incorrect. As I understand the rules listed in section 3.3.3 the attribute specification a="&d;&d;A&a;&a;B&da;" should be normalized to #xD #xD A #xA #xA B #xD #xA for both CDATA and non CDATA declared attributes. Here is how I think the algorithm as described works. 1) Normalization of line breaks has no effect as there are no line breaks in the example. 2) Normalized string is "". 3) This applies to each character, entity reference or character reference in the UNNORMALIZED attribute value. This has four different rules which I assume have been labelled 3a, 3b, 3c and 3d in document order. Processing the UNNORMALIZED attribute value goes as follows. &d; is an entity reference so rule 3b applies so apply the rules to the entity's replacement text. 
 is a character reference so rule 3a applies which means that we have to add #xD to the NORMALIZED value. &d; ditto. A is another character so rule 3d applies so we add A to the NORMALIZED value. &a; is an entity reference so rule 3b applies so apply the rules to the entity's replacement text. 
 is a character reference so rule 3a applies which means that we have to add #xA to the NORMALIZED value. &a; ditto. B is another character so rule 3d applies so we add B to the NORMALIZED value. &da; is an entity reference so rule 3b applies so apply the rules to the entity's replacement text. 
 is a character reference so rule 3a applies which means that we have to add #xD to the NORMALIZED value. 
 is a character reference so rule 3a applies which means that we have to add #xA to the NORMALIZED value. I have just done some more reading of the specification and realise that example 2 is correct and the reason is that the literal entity value has already had any character references resolved before it is processed by the attribute value normalization rules. It would be much clearer if rule 3b contained a reference to section 4.5 which describes the "Construction of Internal Entity Replacement Text". Also an example which illustrates this would be good. e.g. Given <!ENTITY literal-a "&#xD;"> then a="&literal-a;A;&literal-a;B;&literal-a;" would be normalized to #xA A #xA B #xA for both CDATA and non CDATA attributes. A more detailed working through of the examples would be also be useful specifying the replacement text of the entities before normalization. This could may be be added to appendix D. -- This message may contain confidential information and will be protected by copyright. If this email isn't for you then we'd be grateful if you could notify Volantis by return and delete it. You should not copy, disclose or distribute any of its contents. Any reply may be read by the recipient to whom you send it and others within Volantis Systems Ltd. Although we aim to use efficient virus checking procedures we accept no liability for viruses and recipients should use their own virus checking procedures.
Received on Friday, 12 July 2002 09:59:35 UTC