Re: Signed-XML and White Space?

At 05:20 PM 6/23/99 -0700, Bugbee, Larry wrote:
 >When you sign something you may or may not care about white space (extra 
 >spaces, CR/LF, etc.).  For example, you might not care about white space
if it is 
 >a paragraphs of simple text and your assertion is that those words are
yours.  Line 
 >breaks are unimportant giving the renderer choices.  So, signing words and
their breaks 
 >is all that is necessary.   
 >
 >If, however, the content were a purchase order, spacing is extremely
important lest the 
 >right numbers appear in the wrong columns.  Here we should sign all the
white space 
 >and have it preserved upon rendering.
 >How should we go about this?  Do we care?

It certainly could matter. I think the current trend is to say it doesn't,
but perhaps an XML expert can give a better answer -- or once the C14N spec
goes public. [3] Part of the question relates to which type of thing are we
speaking of: Whitespace within markup itself (?), within attribute values
(XML says ignore spurious white space), CDATA (?) or element content (if
preserve="yes"). I think because the C14N is of standalone="yes", that
restricts a couple of things. On the element content, the chain of
references to examine is:

[0] http://www.w3.org/TR/REC-xml#sec-rmd
The standalone document declaration must have the value "no" if any external
markup declarations contain declarations of ... attributes with values
subject to normalization, where the attribute appears in the document with a
value which will change as a result of normalization, or element types with
element content, if white space occurs directly within any instance of those
types. 

[1]http://www.w3.org/TR/REC-xml#sec-white-space
3.3.3 Attribute-Value Normalization. ... a whitespace character (#x20, #xD,
#xA, #x9) is processed by appending #x20 to the normalized value, except
that only a single #x20 is appended for a "#xD#xA" sequence that is part of
an external parsed entity or the literal entity value of an internal parsed
entity... If the declared value is not CDATA, then the XML processor must
further process the normalized attribute value by discarding any leading and
trailing space (#x20) characters, and by replacing sequences of space (#x20)
characters by a single space (#x20) character.

[2] http://www.w3.org/TR/xml-infoset#infoitem.character
2.6.1 ... A flag indicating whether the character is whitespace appearing
within element content (see [XML], 2.10 "White Space Handling"). Validating
processors are required by XML 1.0 to provide this information;
non-validating processors may always set this flag to false. 

[3] http://www.w3.org/XML/Group/1999/06/xml-c14n-19990622.html#chardata-info
2.6 Character Information Items. The canonical form conveys all of the
required properties of the character information item except for the flag
indicating whitespace within element content.
None of the optional properties of the character information item are
conveyed. In particular, no CDATA sections occur on the canonical form;
markup characters occurring in CDATA sections are escaped in the canonical
form just as are all other such characters. 

_________________________________________________________
Joseph Reagle Jr.   
Policy Analyst           mailto:reagle@w3.org
XML-Signature Co-Chair   http://w3.org/People/Reagle/

Received on Friday, 25 June 1999 15:01:40 UTC