- From: Donald E. Eastlake 3rd <dee3@torque.pothole.com>
- Date: Tue, 25 Jan 2000 16:18:30 -0500
- To: w3c-ietf-xmldsig@w3.org
<H2>7.0 <A name=sec-XML-Canonicalization>XML Canonicalization</A> and Syntax Constraint Considerations</H2> <P>Digital signatures only work if the verification calculations are performed on exactly the same bits as the signing calculations. If the surface representation of the signed data can change between signing and verification, then some way to standardize the changeable aspect must be used before signing and verification. For example, even for simple ASCII text there are at least three widely used line ending sequences. If it is possible for signed text to be modified from one line ending convention to another between the time of signing and signature verification, then the line endings need to be canonicalized to a standard form before signing and verification or the signatures will break. </P> <P>XML is subject to surface representation changes and to processing which discards some surface information. For this reason, XML digital signatures have a provision for indicating canonicalization methods in the signature so that a verifier can use the same canonicalization as the signer. </P> <P>Throughout this document we distinguish between the canonicalization of a <TT>Signature</TT> data object and other signed XML data objects. It is possible for an isolated XML document to be treated as if it were binary data so that no changes can occur. In that case, the digest of the document will not change and it need not be canonicalized if it is signed and verified as such. However, XML that is read and processed using standard XML parsing and processing techniques is frequently changed such that some of its surface representation information is lost or modified. In particular, this will occur in many cases for the <TT>Signature</TT> and enclosed <TT>SignedInfo</TT> elements since they, and possibly an encompassing XML document, will be processed as XML. </P> <P>Similarly, these considerations apply to <TT>Manifest</TT>, <TT>Object</TT>, and <TT>SignatureProperties</TT> elements if those elements have been digested, their <TT>DigestValue</TT> is to be checked, and they are being processed as XML.</P> <P>The kinds of changes in XML that may need to be canonicalized can be divided into three categories. There are those related to the basic [XML], as described in 7.1 below. There are those related to [DOM], [SAX], or similar processing as described in 7.2 below. And, third, there is the possibility of character set conversion, such as between UTF-8 and UTF-16, both of which all XML standards compliant processors are required to support. Any canonicalization algorithm should yield output in a specific fixed character set. For both the minimal canonicalization defined in this document and the W3C Canonical XML [<A href="http://www.w3.org/Signature/Drafts/WD-xmldsig-core-20000114/Overview.html#ref-XML-c14n">XML-c14n</A>], that character set is UTF-8. </P> <H3>7.1 <A name=sec-XML-1>XML 1.0</A>, Syntax Constraints, and Canonicalization</H3> <P>XML 1.0 [<A href="http://www.w3.org/Signature/Drafts/WD-xmldsig-core-20000114/Overview.html#ref-XML">XML</A>] defines an interface where a conformant application reading XML is given certain information from that XML and not other information. In particular, <OL> <LI>line endings are normalized to the single character #xA by dropping #xD characters if they are immediately followed by a #xA and replacing them with #xA in all other cases, <LI>missing attributes declared to have default values are provided to the application as if present with the default value, <LI>character references are replaced with the corresponding character, <LI>entity references are replaced with the corresponding declared entity, <LI>attribute values are normalized by <OL type=A> <LI>replacing character and entity references as above, <LI>replacing occurrences of #x9, #xA, and #xD with #x20 (space) except that the sequence #xD#xA is replaced by a single space, and <LI>if the attribute is not declared to be CDATA, stripping all leading and trailing spaces and replacing all interior runs of spaces with a single space, and </LI></OL> <LI>for elements declared to have element content, eliminate white space that appears within their content but not within the content of any enclosed element. </LI></OL> <P>Note that items (2), (4), (5C), and (6) depend on specific Schema, DTD, or similar declarations. In the general case, such declarations will not be available to or used by the signature verifier. Thus, to interoperate between different XML implementations, the following syntax contraints MUST be observed when generating any signed material to be processed as XML, including the <TT>SignedInfo</TT> element: <OL> <LI>attributes having default values be explicitly present, <LI>all entity references (except "amp", "lt", "gt", "apos", and "quot" which are pre-defined) be expanded, <LI>attribute value white space be normalized, and <LI>insignificant white space not be generated within elements having element content. </LI></OL> <H3>7.2 <A name=sec-DOM-SAX>DOM/SAX</A> Processing and Canonicalization</H3> <P>In addition to the canonicalization and syntax constraints discussed above, many XML applications use the Document Object Model [<A href="http://www.w3.org/Signature/Drafts/WD-xmldsig-core-20000114/Overview.html#ref-DOM">DOM</A>] or The Simple API for XML [<A href="http://www.w3.org/Signature/Drafts/WD-xmldsig-core-20000114/Overview.html#ref-SAX">SAX</A>]. DOM maps XML into a tree structure of nodes and typically assumes it will be used on an entire document with subsequent processing being done on this tree. SAX converts XML into a series of events such as a start tag, content, etc. In either case, many surface characteristics such as the ordering of attributes and insignificant white space within start/end tags is lost. In addition, namespace declarations are mapped over the nodes to which they apply, losing the namespace prefixes in the source text and, in most cases, losing the where namespace declarations appeared in the original instance.</P> <P>If an XML Signature is to be produced or verified on a system using the DOM or SAX processing, a canonical method is needed to serialize the relevant part of a DOM tree or sequence of SAX events. XML canonicalization specifications, such as [<A href="http://www.w3.org/Signature/Drafts/WD-xmldsig-core-20000114/Overview.html#ref-XML-c14n">XML-c14n</A>], are based only on information which is preserved by DOM and SAX. For an XML Signature to be verifiable by an implementation using DOM or SAX, not only must the syntax constraints given in <A href="http://www.w3.org/Signature/Drafts/WD-xmldsig-core-20000114/Overview.html#sec-XML-1">section-7.1</A> be followed but an appropriate XML canonicalization MUST be specified so that the verifier can re-serialize DOM/SAX mediated input into the same byte sequence that was signed.</P>
Received on Tuesday, 25 January 2000 16:18:34 UTC