- From: Donald E. Eastlake 3rd <dee3@torque.pothole.com>
- Date: Tue, 25 Jan 2000 16:18:30 -0500
- To: w3c-ietf-xmldsig@w3.org
<H2>7.0 <A name=sec-XML-Canonicalization>XML Canonicalization</A> and Syntax
Constraint Considerations</H2>
<P>Digital signatures only work if the verification calculations are performed
on exactly the same bits as the signing calculations. If the surface
representation of the signed data can change between signing and verification,
then some way to standardize the changeable aspect must be used before signing
and verification. For example, even for simple ASCII text there are at least
three widely used line ending sequences. If it is possible for signed text to be
modified from one line ending convention to another between the time of signing
and signature verification, then the line endings need to be canonicalized to a
standard form before signing and verification or the signatures will break. </P>
<P>XML is subject to surface representation changes and to processing which
discards some surface information. For this reason, XML digital signatures have
a provision for indicating canonicalization methods in the signature so that a
verifier can use the same canonicalization as the signer. </P>
<P>Throughout this document we distinguish between the canonicalization of a
<TT>Signature</TT> data object and other signed XML data objects. It is possible
for an isolated XML document to be treated as if it were binary data so that no
changes can occur. In that case, the digest of the document will not change and
it need not be canonicalized if it is signed and verified as such. However, XML
that is read and processed using standard XML parsing and processing techniques
is frequently changed such that some of its surface representation information
is lost or modified. In particular, this will occur in many cases for the
<TT>Signature</TT> and enclosed <TT>SignedInfo</TT> elements since they, and
possibly an encompassing XML document, will be processed as XML. </P>
<P>Similarly, these considerations apply to <TT>Manifest</TT>, <TT>Object</TT>,
and <TT>SignatureProperties</TT> elements if those elements have been digested,
their <TT>DigestValue</TT> is to be checked, and they are being processed as
XML.</P>
<P>The kinds of changes in XML that may need to be canonicalized can be divided
into three categories. There are those related to the basic [XML], as described
in 7.1 below. There are those related to [DOM], [SAX], or similar processing as
described in 7.2 below. And, third, there is the possibility of character set
conversion, such as between UTF-8 and UTF-16, both of which all XML standards
compliant processors are required to support. Any canonicalization algorithm
should yield output in a specific fixed character set. For both the minimal
canonicalization defined in this document and the W3C Canonical XML [<A
href="http://www.w3.org/Signature/Drafts/WD-xmldsig-core-20000114/Overview.html#ref-XML-c14n">XML-c14n</A>],
that character set is UTF-8. </P>
<H3>7.1 <A name=sec-XML-1>XML 1.0</A>, Syntax Constraints, and
Canonicalization</H3>
<P>XML 1.0 [<A
href="http://www.w3.org/Signature/Drafts/WD-xmldsig-core-20000114/Overview.html#ref-XML">XML</A>]
defines an interface where a conformant application reading XML is given certain
information from that XML and not other information. In particular,
<OL>
<LI>line endings are normalized to the single character #xA by dropping #xD
characters if they are immediately followed by a #xA and replacing them with
#xA in all other cases,
<LI>missing attributes declared to have default values are provided to the
application as if present with the default value,
<LI>character references are replaced with the corresponding character,
<LI>entity references are replaced with the corresponding declared entity,
<LI>attribute values are normalized by
<OL type=A>
<LI>replacing character and entity references as above,
<LI>replacing occurrences of #x9, #xA, and #xD with #x20 (space) except that
the sequence #xD#xA is replaced by a single space, and
<LI>if the attribute is not declared to be CDATA, stripping all leading and
trailing spaces and replacing all interior runs of spaces with a single
space, and </LI></OL>
<LI>for elements declared to have element content, eliminate white space
that appears within their content but not within the content of any enclosed
element. </LI></OL>
<P>Note that items (2), (4), (5C), and (6) depend on specific Schema, DTD, or
similar declarations. In the general case, such declarations will not be
available to or used by the signature verifier. Thus, to interoperate between
different XML implementations, the following syntax contraints MUST be
observed when generating any signed material to be processed as XML,
including the <TT>SignedInfo</TT> element:
<OL>
<LI>attributes having default values be explicitly present,
<LI>all entity references (except "amp", "lt", "gt", "apos", and "quot" which
are pre-defined) be expanded,
<LI>attribute value white space be normalized, and
<LI>insignificant white space not be generated within elements having element
content. </LI></OL>
<H3>7.2 <A name=sec-DOM-SAX>DOM/SAX</A> Processing and Canonicalization</H3>
<P>In addition to the canonicalization and syntax constraints discussed above,
many XML applications use the Document Object Model [<A
href="http://www.w3.org/Signature/Drafts/WD-xmldsig-core-20000114/Overview.html#ref-DOM">DOM</A>]
or The Simple API for XML [<A
href="http://www.w3.org/Signature/Drafts/WD-xmldsig-core-20000114/Overview.html#ref-SAX">SAX</A>].
DOM maps XML into a tree structure of nodes and typically assumes it will be
used on an entire document with subsequent processing being done on this tree.
SAX converts XML into a series of events such as a start tag, content, etc. In
either case, many surface characteristics such as the ordering of attributes and
insignificant white space within start/end tags is lost. In addition, namespace
declarations are mapped over the nodes to which they apply, losing the namespace
prefixes in the source text and, in most cases, losing the where namespace
declarations appeared in the original instance.</P>
<P>If an XML Signature is to be produced or verified on a system using the DOM
or SAX processing, a canonical method is needed to serialize the relevant
part of a DOM tree or sequence of SAX events. XML canonicalization
specifications, such as [<A
href="http://www.w3.org/Signature/Drafts/WD-xmldsig-core-20000114/Overview.html#ref-XML-c14n">XML-c14n</A>],
are based only on information which is preserved by DOM and SAX. For an XML
Signature to be verifiable by an implementation using DOM or SAX, not only must
the syntax constraints given in <A
href="http://www.w3.org/Signature/Drafts/WD-xmldsig-core-20000114/Overview.html#sec-XML-1">section-7.1</A>
be followed but an appropriate XML canonicalization MUST be specified so that
the verifier can re-serialize DOM/SAX mediated input into the same byte sequence
that was signed.</P>
Received on Tuesday, 25 January 2000 16:18:34 UTC