Attribute normalization from Kevin Regan on 2000-06-23 (w3c-ietf-xmldsig@w3.org from April to June 2000)

From: Kevin Regan <kevinr@valicert.com>
Date: Fri, 23 Jun 2000 15:26:23 -0700
To: John Boyer <jboyer@PureEdge.com>
Cc: w3c-ietf-xmldsig@w3.org
Message-id: <27FF4FAEA8CDD211B97E00902745CBE2015B7ADA@seine.valicert.com>

I have a question on section 4 of the XML C14N spec.  In this section,
it mentions the normalization of attributes:

-------------------------------------------------------

Namespace and Attribute Nodes- a space, the node's QName, an equals
sign, an open double quote, the modified string value, and a close
double quote. The string value of the node is modified by replacing all
ampersands (&) with &amp;, all double quote characters with &quot;, and
the whitespace characters #x9, #xA, and #xD, with character references.
The character references are written in uppercase hexadecimal with no
leading zeroes (for example, #xD is represented by the character
reference&#xD;). 

--------------------------------------------------------

However, when an XML processor reads in and parses an XML document, it
should
do the following (from XML 1.0 spec, section 3.3.3):

----------------------------------------------------

3.3.3 Attribute-Value Normalization
Before the value of an attribute is passed to the application or checked
for validity, the XML processor must normalize it as follows: 

-- a character reference is processed by appending the referenced
character to the attribute value 
-- an entity reference is processed by recursively processing the
replacement text of the entity 
-- a whitespace character (#x20, #xD, #xA, #x9) is processed by
appending #x20 to the normalized value, except that only a single #x20
is appended for a "#xD#xA" sequence that is part of an external parsed
entity or the literal entity value of an internal parsed entity 
--other characters are processed by appending them to the normalized
value 

If the declared value is not CDATA, then the XML processor must further
process the normalized attribute value by discarding any leading and
trailing space (#x20) characters, and by replacing sequences of space
(#x20) characters by a single space (#x20) character.

All attributes for which no declaration has been read should be treated
by a non-validating parser as if declared CDATA

--------------------------------------------------------

So, it seems that only #x20 characters will be seen in attribute values.
Why does the
spec mention the other values (#xD, #xA, #x9)?

Thanks,
Kevin Regan

kevinr@valicert.com

Received on Friday, 23 June 2000 18:33:05 UTC